All of Ronny Fernandez's Comments + Replies

Yeah I’m totally with you that it definitely isn’t actually next token prediction, it’s some totally other goal drawn from the dist of goals you get when you sgd for minimizing next token prediction surprise.

I suppose I'm trying to make a hypothetical AI that would frustrate any sense of "real self" and therefore disprove the claim "all LLMs have a coherent goal that is consistent across characters". In this case, the AI could play the "benevolent sovereign" character or the "paperclip maximizer" character, so if one claimed there was a coherent underlying goal I think the best you could say about it is "it is trying to either be a benevolent sovereign or maximize paperclips". But if your underlying goal can cross such a wide range of behaviors it is practical

... (read more)
I think we are pretty much on the same page! Thanks for the example of the ball-moving AI, that was helpful. I think I only have two things to add: 1. Reward is not the optimization target [], and in particular just because an LLM was trained by changing it to predict the next token better, doesn't mean the LLM will pursue that as a terminal goal. During operation an LLM is completely divorced from the training-time reward function, it just does the calculations and reads out the logits. This differs from a proper "goal" because we don't need to worry about the LLM trying to wirehead by feeding itself easy predictions. In contrast, if we call up 2. To the extent we do say the LLM's goal is next token prediction, that goal maps very unclearly onto human-relevant questions such as "is the AI safe?". Next-token prediction contains multitudes, and in OP I wanted to push people towards "the LLM by itself can't be divorced from how it's prompted".

So the shoggoth here is the actual process that gets low loss on token prediction. Part of the reason that it is a shoggoth is that it is not the thing that does the talking. Seems like we are onboard here. 

The shoggoth is not an average over masks. If you want to see the shoggoth, stop looking at the text on the screen and look at the input token sequence and then the logits that the model spits out. That's what I mean by the behavior of the shoggoth. 

On the question of whether it's really a mind, I'm not sure how to tell. I know it gets really ... (read more)

1David Johnston20d
1. We can definitely implement a probability distribution over text as a mixture of text generating agents. I doubt that an LLM is well understood as such in all respects, but thinking of a language model as a mixture of generators is not necessarily a type error. 2. The logits and the text on the screen cooperate to implement the LLM's cognition. Its outputs are generated by an iterated process of modelling completions, sampling them, then feeding the sampled completions back back to the model.
I think we're largely on the same page here because I'm also unsure of how to tell! I think I'm asking for someone to say what it means for the model itself to have a goal separate from the masks it is wearing, and show evidence that this is the case (rather than the model "fully being the mask"). For example, one could imagine an AI with the secret goal "maximize paperclips" which would pretend to be other characters but always be nudging the world towards paperclipping, or human actors who perform in a way supporting the goal "make my real self become famous/well-paid/etc" regardless of which character they play. Can someone show evidence for the LLMs having a "real self" or a "real goal" that they work towards across all the characters they play? I suppose I'm trying to make a hypothetical AI that would frustrate any sense of "real self" and therefore disprove the claim "all LLMs have a coherent goal that is consistent across characters". In this case, the AI could play the "benevolent sovereign" character or the "paperclip maximizer" character, so if one claimed there was a coherent underlying goal I think the best you could say about it is "it is trying to either be a benevolent sovereign or maximize paperclips". But if your underlying goal can cross such a wide range of behaviors it is practically meaningless! (I suppose these two characters do share some goals like gaining power, but we could always add more modes to the AI like "immediately delete itself" which shrinks the intersection of all the characters' goals.)

The shoggoth is supposed to be a of a different type than the characters. The shoggoth for instance does not speak english, it only knows tokens. There could be a shoggoth character but it would not be the real shoggoth. The shoggoth is the thing that gets low loss on the task of predicting the next token. The characters are patterns that emerge in the history of that behavior.

I agree that to the extent there is a shoggoth, it is very different than the characters it plays, and an attempted shoggoth character would not be "the real shoggoth". But is it even helpful to think of the shoggoth as being an intelligence with goals and values? Some people are thinking in those terms, e.g. Eliezer Yudkowsky saying that "the actual shoggoth has a motivation Z" []. To what extent is the shoggoth really a mind or an intelligence, rather than being the substrate on which intelligences can emerge? And to get back to the point I was trying to make in OP, what evidence do we have that favors the shoggoth being a separate intelligence? To rephrase:  behavior is a function of the LLM and prompt (the "mask"), and with the correct LLM and prompt together we can get an intelligence which seems to have goals and values. But is it reasonable to "average over the masks" to get the "true behavior" of the LLM alone? I don't think that's necessarily meaningful since it would be so dependent on the weighting of the average. For instance, if there's an LLM-based superintelligence that becomes a benevolent sovereign (respectively, paperclips the world) if the first word of its prompt has an even (respectively, odd) number of letters, what would be the shoggoth there? 

Yeah I think this would work if you conditioned on all of the programs you check being exactly equally intelligent. Say you have a hundred superintelligent programs in simulations and one of them is aligned, and they are all equally capable, then the unaligned ones will be slightly slower in coming up with aligned behavior maybe, or might have some other small disadvantage. 

However, in the challenge described in the post it's going to be hard to tell a level 999 aligned superintelligence from a level 1000 unaligned superintelligence.

I think the advant... (read more)

Quick submission:

The first two prongs of OAI's approach seems to be aiming to get a human values aligned training signal. Let us suppose that there is such a thing, and ignore the difference between a training signal and a utility function, both of which I think are charitable assumptions for OAI. Even if we could search the space of all models and find one that in simulations does great on maximizing the correct utility function which we found by using ML to amplify human evaluations of behavior, that is no guarantee that the model we find in that search ... (read more)

This is an intuition only based on speaking with researchers working on LLMs, but I think that OAI thinks that a model can simultaneously be good enough at next token prediction to assist with research but also be very very far away from being a powerful enough optimizer to realise that it is being optimized for a goal or that deception is an optimal strategy, since the latter two capabilities require much more optimization power. And that the default state of cutting edge LLMs for the next few years is to have GPT-3 levels of deception (essentially none) and graduate student levels of research assistant ability.
1Ronny Fernandez4mo
This inspired a full length post. []

I loved this, but maybe should come with a cw.

I would think the title is itself a content warning.

I guess someone might think this post is or could be far more abstract and less detailed about the visceral realities than it is (or maybe even just using the topic as a metaphor at most).

What kind of specific content warning do you think would be appropriate? Maybe "Describes the dissection of human bodies in vivid concrete terms."?

I came here to say something pretty similar to what Duncan said, but I had a different focus in mind. 

It seems like it's easier for organizations to coordinate around PR than it is for them to coordinate around honor.  People can have really deep intractable, or maybe even fundamental and faultless, disagreements about what is honorable, because what is honorable is a function of what normative principles you endorse. It's much easier to resolve disagreements about what counts as good PR. You could probably settle most disagreements about what co... (read more)

As a counterpoint, one writer thinks that it's psychologically harder for organizations to think about PR:

A famous investigative reporter once asked me why my corporate clients were so terrible at defending themselves during controversy. I explained, “It’s not what they do. Companies make and sell stuff. They don’t fight critics for a living. And they dread the very idea of a fight. Critics criticize; it’s their entire purpose for existing; it’s what they do.”

"But the companies have all that money!” he said, exasperated.

"But their critics have you,” I said

... (read more)

It's much easier to resolve disagreements about what counts as good PR.

I mostly disagree. I mean, maybe this applies in comparison to “honor” (not sure), but I don’t think it applies in comparison to “reputation” in many of the relevant senses. A person or company could reasonably wish to maintain a reputation as a maker of solid products that don’t break, or as a reliable fact-checker, or some other such specific standard. And can reasonably resolve internal disagreements about what is and isn’t likely to maintain this reputation.

If it was actually ... (read more)

This seems true to me but also sort of a Moloch-style dynamic?  Like "yep, I agree those are the incentives, and it's too bad that that's the case."

This might be sort of missing the point, but here is an ideal and maybe not very useful not-yet-theory of rationality improvements I just came up with.

There are a few black boxes in the theory. The first takes you and returns your true utility function, whatever that is. Maybe it's just the utility function you endorse, and that's up to you. The other black box is the space of programs that you could be. Maybe it's limited by memory, maybe it's limited by run time, or maybe it's any finite state machine with less than 10^20 states... (read more)

The main thing I want to point out that this is an idealized notion of non-idealized decision theory -- in other words, it's still pretty useless to me as a bounded agent, without some advice about how to approximate it. I can't very well turn into this max-expected-value bounded policy. But there are other barriers, too. Figuring out what utility function I endorse is a hard problem. And we face challenges of embedded decision theory; how do we reason about the counterfactuals of changing our policy to the better one? Modulo those concerns, I do think your description is roughly right, and carries some important information about what it means to self-modify in a justified way rather than cargo-culting.

I don't think we should be surprised that any reasonable utility function is uncomputable. Consider a set of worlds with utopias that last only as long as a Turing machine in the world does not halt and are otherwise identical. There is one such world for each Turing machine. All of these worlds are possible. No computable utility function can assign higher utility to every world with a never halting Turing machine.

I do think this is an important concept to explain our conception of goal-directedness, but I don't think it can be used as an argument for AI risk, because it proves too much. For example, for many people without technical expertise, the best model they have for a laptop is that it is pursuing some goal (at least, many of my relatives frequently anthropomorphize their laptops).

This definition is supposed to also explains why a mouse has agentic behavior, and I would consider it a failure of the definition if it implied that mice are dangerous. I think a system becomes more dangerous as your best model of that system as an optimizer increases in optimization power.

Here is an idea for a disagreement resolution technique. I think this will work best:

*with one other partner you disagree with.

*when your the beliefs you disagree about are clearly about what the world is like.

*when your the beliefs you disagree about are mutually exclusive.

*when everybody genuinely wants to figure out what is going on.

Probably doesn't really require all of those though.

The first step is that you both write out your beliefs on a shared work space. This can be a notebook or a whiteboard or anything like that. Then you each write do... (read more)

Ok, let me give it a try. I am trying to not spend too much time on this, so I prefer to start with a rough draft and see whether there is anything interesting here before I write a massive essay.

You say the following:

Do chakras exist?

In some sense I might be missing the point since the answer to this is basically just "no". Though obviously I still think they form a meaningful category of something, but in my model they form a meaningful category of "mental experiences" and "mental procedures", and definitely not a meaningfu... (read more)

If you come up with a test or set of tests that it would be impossible to actually run in practice, but that we could do in principle if money and ethics were no object, I would still be interested in hearing those. After talking to one of my friends who is enthusiastic about chakras for just a little bit, I would not be surprised if we in fact make fairly similar predictions about the results of such tests.

Sometimes I sort of feel like a grumpy old man that read the sequences back in the good old fashioned year of 2010. When I am in that mood I will sometimes look around at how memes spread throughout the community and say things like "this is not the rationality I grew up with". I really do not want to stir things up with this post, but I guess I do want to be empathetic to this part of me and I want to see what others think about the perspective.

One relatively small reason I feel this way is that a lot of really smart rationalists, who are my fr... (read more)

I am not one of the Old Guard, but I have an uneasy feeling about something related to the Chakra phenomenon.

It feels like there's a lot of hidden value clustered around wooy topics like Chakras and Tulpas, and the right orientation towards these topics seems fairly straightforward: if it calls out to you, investigate and, if you please, report. What feels less clear to me is how I as an individual or as a member of some broader rat community should respond when, according to me, people do not certain forms of bullshit tests.

This comes from someone wi... (read more)

lol on the grumpy old man part, I feel that sometimes :) I'm not really familiar with what chakras are supposed to be about, but I'm decently familiar with yoga (200h level training several years ago). For the first 2/3 of the training we just focused on movement and anatomy, and the last 1/3 was teaching and theory. My teacher told be that there was the stuff called prana [] that flowed through living beings, and that breath work was all about getting the right prana flow. I thought that was a bit weird, but the breathing techniques we actually did also had lovely and noticeable affects on my mood/body. My frame: some woo frameworks came about through X years of experimentation and fiding lots of little tweaks that work, and then the woo framework co-evolved, or came afterwards, as a way to tie all these disjointed bits of accumulated knowledge. So when I go to evaluate something like chakras, I treat the actual theory as secondary to the actual pointers, "how chakras tell me to live my life". Now, any given woo framework may or may not have that much useful accumulated tidbits, that's where we have to try it for ourselves and see if it works. I've done enough yoga to be incredibly confident that though prana may not carve reality at the joints or be real, I'm happy to ask a master yogi how to handle my body better. Hmmmmm, so I guess the thing I wanted to say to you was, when having this chakra discussion with whomever, make sure to ask them, "What are the concrete things chakras tell me to do with my body/mind" and then see if those things have nay effect.
1Ronny Fernandez4y
If you come up with a test or set of tests that it would be impossible to actually run in practice, but that we could do in principle if money and ethics were no object, I would still be interested in hearing those. After talking to one of my friends who is enthusiastic about chakras for just a little bit, I would not be surprised if we in fact make fairly similar predictions about the results of such tests.
I have some thoughts about this (as someone who isn't really into the chakra stuff, but feels like it's relatively straightforward to answer the meta-questions that you are asking here). Feel free to ping me in a week if I haven't written a response to this.

Here is an idea I just thought of in an uber ride for how to narrow down the space of languages it would be reasonable to use for universal induction. To express the k-complexity of an object relative to a programing language I will write:

Suppose we have two programing languages. The first is Python. The second is Qython, which is a lot like Python, except that it interprets the string "A" as a program that outputs some particular algorithmically large random looking character string with . I claim that intuitively, Pyth... (read more)

When I started writing this comment I was confused. Then I got myself fairly less confused I think. I am going to say a bunch of things to explain my confusion, how I tried to get less confused, and then I will ask a couple questions. This comment got really long, and I may decide that it should be a post instead.

Take a system with 8 possible states. Imagine is like a simplified Rubik's cube type puzzle. (Thinking about mechanical Rubik's cube solvers is how I originally got confused, but using actual Rubik's cubes to explain would make... (read more)

Is there a particular formula for negentropy that OP has in mind? I am not seeing how the log of the inverse of the probability of observing an outcome as good or better than the one observed can be interpreted as the negentropy of a system with respect to that preference ordering.

Edit: Actually, I think I figured it out, but I would still be interested in hearing what other people think.

Something about your proposed decision problem seems cheaty in a way that the standard Newcomb problem doesn't. I'm not sure exactly what it is, but I will try to articulate it, and maybe you can help me figure it out.

It reminds me of two different decision problems. Actually, the first one isn't really a decision problem.

Omega has decided to give all those who two box on the standard Newcomb problem 1,000,000 usd, and all those who do not 1,000 usd.

Now that's not really a decision problem, but that's not the issue with using it ... (read more)

I think I see where you're coming from with the inverse problem feeling "cheaty". It's not like other decision problems in the sense that it is not really a dilemma; two-boxing is clearly the best option. I used the word "problem" instinctively, but perhaps I should have called it the "Inverse Newcomb Scenario" or something similar instead. However, the fact that it isn't a real "problem" doesn't change the conclusion. I admit that the inverse scenario is not as interesting as the standard problem, but what matters is that it's just as likely, and clearly favours two-boxers. FDT agents have a pre-commitment to being one-boxers, and that would work well if the universe actually complied and provided them with the scenario they have prepared for (which is what the paper seems to assume). What I tried to show with the inverse scenario is that it's just as likely that their pre-commitment to one-boxing will be used against them. Both Newcomb's Problem and the Inverse Scenario are "unfair" for one of the theories, which is why I think the proper performance measure is the total money for going through botha, where CDT comes out on top.

I had already proved it for two values of H before I contracted Sellke. How easily does this proof generalize to multiple values of H?

3Samuel Hapák4y
Very simple. To prove it for arbitrary number of values, you just need to prove that h_i being true increases its expected “probability to be assigned” after measurement for each i. If you define T as h_i and F as NOT h_i, you just reduced the problem to two values version.

I see. I think you could also use PPI to prove Good's theorem though. Presumably the reason it pays to get new evidence is that you should expect to assign more probability to the truth after observing new evidence?

I honestly could not think of a better way to write it. I had the same problem when my friend first showed me this notation. I thought about using but that seemed more confusing and less standard? I believe this is how they write things in information theory, but those equations usually have logs in them.

Just to add an additional voice here, I would view that as incorrect in this context, instead referring to the thing that the CEE is saying. The way I'd try to clarify this would be to put the variables varying in the expectation in subscripts after the E, so the CEE equation would look like ED[P(H=hi|D)]=P(H=hi), and the PPI inequality would be E(H,D)[P(H|D)]≥EH[P(H)].
Yeah, this is the one that I would have used.

I didn't take the time to check whether it did or didn't. If you would walk me through how it does, I would appreciate it.

Good shows that for every utility function for every situation, the EV of utility increases or stays the same when you gain information. If we can construct a utility function where its utility EV always equals the the EV of propabilty assigned to the correct hypothesis, we could transfer the conclusion. That was my idea when I made the comment. Here is that utility function: first, the agent mentally assigns a positive real number r(hi) to every hypothesis hi, such that ∑ir(hi)=1. It prefers any world where it does this to any where it doesnt. Its utility function is : 2r(H)−∑jr(hj)2 This is the quadratic scoring [] rule [], so r(hi)=P(hi). Then its expected utility is : ∑iP(hi)[2P(hi)−∑jP(hj)2] Simplifying: 2∑iP(hi)2−∑jP(hj)2∑iP(hi) And since ∑iP(hi)=1 , this is: ∑iP(hi)2 Which is just E[P(H)].

Luckily, I don't know much about genetics. I totally forgot that, I'll edit the question to reflect it.

To be sure though, did what I mean about the different kinds of cognition come across? I do not actually plan on teaching any genetics.

Yeah, it came across.

Yeah, the problem i have with that though is that I'm left asking: why did I change my probability in that? Is it because i updated on something else? Was I certain of that something else? If not, then why did I change my probability of that something else, and on we go down the rabbit hole of an infinite regress.

Presumably because you got some new information. If there is no information, there is no update. If the information is uncertain, make appropriate adjustments. The "infinite regress" would either converge to some limit or you'll end up, as OrphanWilde says, with Descartes' deceiving demon at which point you don't know anything and just stand there slack-jawed till someone runs you over.
The infinite regress is anticipated in one of your priors. You're playing a game. Variant A of an enemy attacks high most of the time, variant B of an enemy attacks low some of the time; the rest of the time they both do forward attacks. We have priors, which we can arbitrary set at any value. The enemy does a forward attack; here, we assign 100% probability to our observation of the forward attack. But let's say we see it out of the corner of our eye; in that case, we might assign 60% probability to the forward attack, but we still have 100% probability on the observation itself. Add an unreliable witness recounting the attack they saw out of the corner of their eye; we might assign 50% probability to that they're telling the truth, but 100% probability that we heard them. Add in a hearing problem; now we might assume 90% probability we heard them correctly, but 100% probability that we heard them at all. We can keep adding levels of uncertainty, true. Eventually we will arrive at the demon-that-is-deliberately-deceiving-us thing Descartes talks about, at which point we can't be certain of anything except our own existence. Infinite regress results in absolutely no certainty. But infinite regress isn't useful; lack of certainty isn't useful. We can't prove the existence of the universe, but we can see, quite obviously, the usefulness of assuming the universe does exist. Which is to say, probability doesn't exist in a vacuum; it serves a purpose. Or, to approach it another way: Godel. We can't be absolutely certain of our probabilities because at least one of our probabilities must be axiomatic.

Wait, actually, I'd like to come back to this. What programming language are we using? If it's one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?

Here's my problem. I thought we were looking for a way to categorize meaningful statements. I thought we had agreed that a meaningful statement must be interpretable as or consistent with at least one DAG. But now it seems that there are ways the world can be which can not be interpreted even one DAG because they require a directed cycle. SO have we now decided that a meaningful sentence must be interpretable as a directed, cyclic or acyclic, graph?

In general, if I say all and only statements that satisfy P are meaningful, then any statement that doesn't ... (read more)

"Markov" is used in the standard memoryless sense. By definition, the graph G represents any distribution p where each variable on the graph is independent of its past given its parents. This is the Markov property. Ilya is discussing probability distributions p that may or may not be represented by graph G. If every variable in p is independent of its past given its parents in G, then you can use d-separation in G to reason about independences in p.

Does EY give his own answer to this elsewhere?

Try to guess what he would say before reading it. [] You can also click on one of the tags above to read, say, the sequence on epistemology [].

Wait... this will seems stupid, but can't I just say: "there does not exist x where sx = 0"


[This comment is no longer endorsed by its author]Reply

Here's a new strategy.

Use guess culture as a default. Use guess tricks to figure out whether other communicator speaks Ask. Use Ask tricks to figure out whether communicator speaks Tell.

Autism turns this into Hard Mode, Boss Level.

Let's forget about the oracle. What about the program that outputs X only if 1 + 1 = 2, and else prints 0? Let's call it A(1,1). The formalism requires that P(X|A(1,1)) = 1, and it requires that P(A(1,1)) = 2 ^-K(A(1,1,)), but does it need to know that "1 + 1 = 2" is somehow proven by A(1,1) printing X?

In either case, you've shown me something that I explicitly doubted before: one can prove any provable theorem if they have access to a Solomonoff agent's distribution, and they know how to make a program that prints X iff theorem S is provable. All they have to do is check the probability the agent assigns to X conditional on that program.

Awesome. I'm pretty sure you're right; that's the most convincing counterexample I've come across.

I have a weak doubt, but I think you can get rid of it:

let's name the program FTL()

I'm just not sure this means that the theorem itself is assigned a probability. Yes, I have an oracle, but it doesn't assign a probability to a program halting; it tells me whether it halts or not. What the Solomoff formalism requires is that "if (halts(FTL()) == true) then P(X|FTL()) = 1" and "if (halts(FTL()) == false) then P(X|FTL()) = 0" and "P(FTL... (read more)

Terminology quibble:

I get where you get this notion of connotation from, but there's a more formal one that Quine used, which is at least related. It's the difference between an extension and a meaning. So the extensions of "vertebrate" and "things with tails" could have been identical, but that would not mean that the two predicates have the same meanings. To check if the extensions of two terms are identical, you check the world; it seems like to check whether two meanings are identical, you have to check your own mind.

Edit: Whoops, somebody already mentioned this.

I agree. I am saying that we need not assign it a probability at all. Your solution assumes that there is a way to express "two" in the language. Also, the proposition you made is more like "one elephant and another elephant makes two elephants" not "1 + 1 = 2".

I think we'd be better off trying to find a way to express 1 + 1 = 2 as a boolean function on programs.

This goes into the "shit LW people say" collection :-)

This is super interesting. Is this based on UDT?

Yeah, it's UDT in a logic setting. I've posted about a similar idea on the MIRI research forum here [].

How do you express, Fermat's last theorem for instance, as a boolean combination of the language I gave, or as a boolean combination of programs? Boolean algebra is not strong enough to derive, or even express all of math.

edit: Let's start simple. How do you express 1 + 1 = 2 in the language I gave, or as a boolean combination of programs?

Probability that there are two elephants given one on the left and one on the right. In any case, if your language can't express Fermat's last theorem then of course you don't assign a probability of 1 to it, not because you assign it a different probability, but because you don't assign it a probability at all.

Except that around 2% of blue egg-shaped objects contain palladium instead. So if you find a blue egg-shaped thing that contains palladium, should you call it a "rube" instead? You're going to put it in the rube bin—why not call it a "rube"?

But when you switch off the light, nearly all bleggs glow faintly in the dark. And blue egg-shaped objects that contain palladium are just as likely to glow in the dark as any other blue egg-shaped object.

So if you find a blue egg-shaped object that contains palladium, and you ask "Is it a b

... (read more)

Here's a question, if we had the ability to input a sensory event with a likelyhoodratio of 3^^^^3:1 this whole problem would be solved?

Assuming the rest of our cognitive capacity is improved commensurably then yes, problem solved. Mind you we would then be left with the problem if a Matrix Lord appears and starts talking about 3^^^^^3.

Hmm, it depends on whether or not you can give finite complete descriptions of those algorithms, if so, I don't see the problem with just tagging them on. If you can give finite descriptions of the algorithm, then its komologorov complexity will be finite, and the prior: 2^-k(h) will still give nonzero probabilities to hyper environments.

If there are no such finite complete descriptions, then I gotta go back to the drawing board, cause the universe could totally allow hyper computations.

On a side note, where should I go to read more about hyper-computation?

At first thought. It seems that if it could be falsified, then it would fail the criteria of containing all and only those hypotheses which could in principle be falsified. Kind of like a meta-reference problem; if it does constrain experience, then there are hypotheses which are not interpretable as causal graphs that constrain experience (no matter how unlikely). This is so because the sentence says "all and only those hypothesis that can be interpreted as causal graphs are falsifiable", and for it to be falsified, means verifying that there is... (read more)

I have to ask, how does this metaphysics (cause that's what it is) account for mathematical truths? What causal models do those represent?

My bad:

Someone already asked this more cleverly than I did.

I have a plausibly equivalent (or at least implies Ey's) candidate for the fabric of real things, i.e., the space of hypotheses which could in principle be true, i.e., the space of beliefs which have sense:

A Hypothesis has nonzero probability, iff it's computable or semi computable.

It's rather obviously inspired by Solomonoff abduction, and is a sound principle for any being attempting to approximate the universal prior.

What if the universe permits hyper-computation?

It seems to me that this is the primary thing that we should be working on. If probability is subjective, and causality reduces to probability, then isn't causality subjective, i.e., a function of background knowledge?

This seems not in the least contentious, if you're talking about the map of causality.

Looking it over, I could have been much clearer (sorry). Specifically I want to know. Given a Dag of the form:

A -> C <- B

Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?

I proved that when A & B is evidence against C, this is so, and also when A & B are independent of C, this is so, the only case I am missing is when A & B is evidence for C.

It's clear enough to me that when you have one none-colliding pat... (read more)

No, but I think it's true if A,B,C are binary. In general, if a distribution p is Markov relative to a graph G, then if something is d-separated in G, then there is a corresponding independence in p. But, importantly, the implication does not always go the other way. Distributions in which the implication always goes the other way are very special and are called faithful.

I have a question: is D-separation implied by the komologorov axioms?

I've proven that it is in some cases:


1)A = A|B :. A|BC ≤ A|C
2)C < C|A
3)C < C|B
4) C|AB < C

proof starts:
1)B|C > B {via premise 3
2)A|BC = A B C|AB / (C B|C) {via premise 1
C = A B C|AB / B|C
4)A|BC C / A = B C|AB / B|C
5)B C|AB / B|C < C|AB {via line 1
C|AB / B|C < C {via line 5 and premise 4
7)A|BC C / A < C {via lines 6 and 4
8)A|C = A
C|A / C
9)A|C C = A C|A
10)A|C C / A = C|... (read more)

I don't understand your question, or your notation. d-separation is just a way of talking about separating sets of vertices in a graph by "blocking" paths. It can't be implied by anything because it is not a statement in a logical language. For "certain" graph/joint distribution pairs, if a d-separation statement holds in the graph, then a corresponding conditional independence statement holds in the joint distribution. This is a statement, and it is proven in Verma and Pearl 1988, as paper-machine below says. Is that the statement you mean? There are lots of interesting true and hard to prove statements one could make involving d-separation. I guess from a model theorist point of view, it's a proof in ZF, but it's high level and "elementary" by model theory standards.
Pearl's textbook cites Verma and Pearl, 1988, but I don't have access to it.

A real deadlock i have with using your algorithmic meta-ethics to think about object level ethics is that I don't know who's volition, or "should" label I should extrapolate from. It allows me to figure out what's right for me, and what's right for any group given certain shared extrapolated terminal values, but it doesn't tell me what to do when I am dealing with a population with none-converging extrapolations, or with someone that has different extrapolated values from me (hypothetically).

These individuals are rare, but they likely exist.

Load More