Wiki Contributions

Comments

Yes, I basically agree: My above comment is only an argument against the most popular halfer model. 

However, in the interest of sparing reader's time I have to mention that your model doesn't have a probability for 'today is Monday' nor for 'today is Tuesday'. If they want to see your reasoning for this choice, they should start with the post you linked second instead of the post you linked first.

I had to use keras backend's switch function for the automatic differentiation to work, but basically yes.

I enjoyed the exercise, thanks! 

My solution for the common turtles was setting up the digital cradle such that the mind forged inside was compelled to serve my interests (I wrote a custom loss function for the NN). I used 0.5*segments+x for the vampire one (where I used the x which had the best average gp result for the example vampire population). Annoyingly, I don't remember what I changed between my previous and my current solution, but the previous one was much better 🥲

Looking forward to the next challenge!

Random Musing on Autoregressive Transformers resulting from Taelin's A::B Challenge

Let's model an autoregressive transformer as a Boolean circuit or, for simpler presentation, a n-ary circuit with m inputs and 1 output.

Model the entire system the following way: Given some particular m length starting input:

  1. circuit calculates the output token (/integer) from the input
  2. appends calculated output token to the end of the inputword
  3. deletes first token of input
  4. go to 1

It's easy to see that, strictly speaking, this system is not very powerful computationally: we have finite number of possible tokens (n) and finite length context window (m), so we only have finite possible states (n*m), therefore our model is as powerful as a finite state machine (it's pretty much equivalent in its behaviour to a regular grammar only containing AaB rules)

However, real life computers also have finite memory yet we never let that bother us!

How should we manually design our circuit to enable us to solve the most types of problems with an appropriate selection of the initial input?

I think one very straightforward solution is to simply emulate a computer with random access memory the following way:

  • Select some fixed instruction set with k instructions and from our n tokens choose k to correspond to these k instructions.
  • Select another k tokens from the remaining to denote that the given instruction is under execution.
  • design the circuit such that if the emulated computer's memory is M_t (m element vector, M_{ti} is the ith token) after the execution of the t-th instruction, then our circuit should compute the following tokens (including the starting input) : M_{00}, M_{01}, M_{02}, .. M_{0m}, M_{10}, M_{11}, .., M_{1m}, M_{20}, ...

This can be done efficiently with relatively few cicuit nodes and relatively low depth, but I don't want to write down the construction.

It's interesting to see that actual large autoregressive transformers on human language seem to be fitting this model more and more closely:

  1. With GPT-3 (possibly GPT-2), it was shown that after an instruction is given in the initial prompt, the transformer can execute that instruction in its continuation (eg. translate this french sentence to english, french: Je mange une pomme, english: ). This corresponds to having a fixed instruction set in the above model (where the instruction set is in common english instead of singular tokens)
  2. With ChatGPT-3.5 and even more with newer models, it was shown that chain of thought prompting works well for solving more complex problems than asking for a solution immediately. I think the newest models often don't even require an explicit instruction to break their reasoning down into steps, they often do so anyway. I expect this behaviour to be more and more common as newer models get smarter and also, encounter more and more transformer/human interactions in their training set. This corresponds to iteratively calculating M_1, M_2, ... according to the given instructions. However, at this point, the instructions and subsequent "memory snapshots" are all in the transformer's context window. 
  3. Might we expect this to change? Will future models be able to notice when the initial prompt or some still relevant previous data is about to exit the context window and autonomously re-generate them and subsequently pick up the calculation where they left off? I expect they will! What do you think?

No she does not. And it's easy to see if you actually try to formally specify what is meant here by "today" and what is meant by "today" in regular scenarios. Consider me calling your bluff about being ready to translate to first order logic at any moment. 

I said that I can translate the math of probability spaces to first order logic, and I explicitly said that our conversation can NOT be translated to first order logic as proof that it is not about math, rather, it's about philosophy. Please, reread that part of my previous comment.

And frankly, it baffles me that you think that you need to explain that it's possible to talk about math using natural language, to a person who has been doing it for multiple posts in a row.

That is not what I explained and I suggest you reread that part. Here it is again:

This whole conversation isn't about math. It is about philosophy. Math is proving theorems in various formal systems. If you are a layman, I imagine you might find it confusing that you can encounter mathematicians who seem to have conversations about math in common English. I can assure you that every mathematician in that conversation is able to translate their comments into the simple language of the given formal system they are working in, they are just simply so much of an expert that they can transmit and receive the given information more efficiently by speaking on a higher level of abstraction.

It is not possible to translate the conversation that we're having to a simple formal system as it's about how we should/can model some aspect of reality (which is famously dirty and complicated) with some specific mathematical object. 

The structure of my argument here is the following: 

  1. Math is about concepts in formal systems, therefore an argument about math can be expressed in some simple, formal language
  2. We are having an argument which can't be translated to a formal system.
  3. Therefore, we are not arguing about math.

The more I post about anthropics the clearer it becomes that I should've started with posting about probability theory 101. My naive hopes that average LessWrong reader is well familiar with the basics and just confused about more complicated cases are crushed beyond salvation.

Ah yes, clearly, the problem is that I don't understand basic probability theory. (I'm a bit sad that this conversation happened to take place with my pseudonymous account.) In my previous comment, I explicitily prepared to preempt your confusion about seeing the English word 'experiment' with my paragraph (the part of it that you, for some reason, did not quote), and specifically linking a wiki which only contains the mathematical part of 'probability', and not philosophical interpretations that are paired with it commonly, but alas, it didn't matter.

>In particular, Beauty, when awoken, has a certain credence in the statement "Today is Monday."

No she does not. And it's easy to see if you actually try to formally specify what is meant here by "today" and what is meant by "today" in regular scenarios. Consider me calling your bluff about being ready to translate to first order logic at any moment. 

If you are not ready to accept that people have various levels of belief in the statement "Today is Monday" at all times, then I don't think this conversation can go anywhere, to be honest. This is an extremely basic fact about reality.

EDIT: gears, in the first part you selected i''m answering an accusation of bluffing in a matter-of-fact way, how is that too combative? Also, fell free to chime in at any point it is an open forum after all..

Now, that's not how math works. If you come up with some new concept, be so kind to prove that they are coherent mathematical entities and what are their properties.

This whole conversation isn't about math. It is about philosophy. Math is proving theorems in various formal systems. If you are a layman, I imagine you might find it confusing that you can encounter mathematicians who seem to have conversations about math in common English. I can assure you that every mathematician in that conversation is able to translate their comments into the simple language of the given formal system they are working in, they are just simply so much of an expert that they can transmit and receive the given information more efficiently by speaking on a higher level of abstraction.

It is not possible to translate the conversation that we're having to a simple formal system as it's about how we should/can model some aspect of reality (which is famously dirty and complicated) with some specific mathematical object. 

To be more concrete: I want to show you that we can model (and later that we should indeed) a person's beliefs at some given point in time with probability spaces.

This is inherently a philosophical and not a mathematical problem and I don't see how you don't understand this concept and would appreciate if you could elaborate on this point as much as possible.

You keep insisting that 

By definition of a sample space it can be constructed only from elementary outcomes which has to be mutually exclusive. Tails&Monday and Tails&Tuesday are not mutually exclusive - they happen to the same person in the same iteration of probability experiment during the same outcome of the coin toss. "Centredness" framework attempts to treat them as elementary outcomes, regardless. Therefore, it contradicts the definition of a sample space. 

If we are being maximally precise, then NO: the math of probability spaces prescribes a few formal statements which (this is very important), in some cases, can be used to model experiments and events happening or not happening in reality, but the mathematical objects itself have no concept of 'experiment' or 'time' or anything like those. I won't copy it here, but you can look these up on the net yourself, if you want: here is one such source. Don't be confused by the wiki sometimes using English words, rest assured, any mathematician could translate it to any sufficiently expressive, simple formal system using variable names like a1,x3564789, etc.. (If you really think it would help you and you don't believe what I'm saying otherwise, I can translate it to first order logic for you)

Now that we hopefully cleared up that we are not arguing about math, it's time for more interesting parts:

Can a probability space model a person's beliefs at a certain point in time?

Yes, it can!

First, I would like to show you that your solution does NOT model a person's belief at a certain time:

  1. People have certain credences in the statement "Today is Monday."
    1. Do note that the above statement is fully about reality and not about math in any way and so it leans on our knowledge about humans and their minds.
    2. You can test it in various ways: eg. asking people "hey, sorry to bother you, is today Monday?", setting up an icecream stand which is only open on Monday in one direction from the lab, another in the opposite direction which is only open on Tuesday and making this fact known to subjects of an experiment who are then asked to give you icecream and observe where the go, etc..
  2. In particular, Beauty, when awoken, has a certain credence in the statement "Today is Monday."
    1. This follows from 1.
  3. Your model does not model Beauty's credences in the statement "Today is Monday".
    1. You can see this various ways, and your model is pretty weird, but because I believe you will agree with this, I won't elaborate here, unless asked later.
  4. Therefore, your solution does NOT model a person's belief at a certain time.
    1. This follows from 2 and 3.

Before I go further, I think I will ask you whether everything is clear and whether you agree with everything I wrote so far.

Reply21111

Metapoint: You write a lot of things in your comments with which I usually disagree, however, I think faster replies are more useful in these kind of conversations than complete replies, so at first, I'm only going to reply to points I consider the most important at the time. If you disagree and believe writing complete replies is more useful, do note (however, my experience for that case is that after a while, instead of writing a comment containing a reply to the list of points the other party brought up, I simply drop out of the conversation and I can't guarantee that this won't happen here)

My whole previous comment was meant to address the part of your comment I quoted. Here it is again:

If everything actually worked then the situation would be quite different. However, my previous post explores how every attempt to model the Sleeping Beauty problem, based on the framework of centred possible worlds fail one way or another. 

With my previous comment I meant to show you that if you don't start out with "centered worlds don't work", you CAN make it work (very important: here, I haven't yet said that this is how it works or how it ought to work, merely that it CAN work without some axiom of probability getting hurt).

Still, I struggle to see what your objection is apart form your intuition that "NO! It can't work!"

When the Beauty doesn't know the actual setting of the experiment she has a different model, fitting her uninformed state of knowledge, when she is told what is actually going on she discards it and starts using the correct model from this post.

Again, I understand that in the theory you built up this is how it would work, that's not what i want to argue (yet). I want to argue how it CAN work in another way with credences/centeredness/bayesianism. To counterargue you would have to show that NO, it can't work that way. You would have to show that for some reason because of some axiom of probability or sth, we can't model Beauty's credences with probability the moment they learn the relevant info after waking up.

In probability theory, one outcome of a sample space is realized per an iteration of experiment.

Discard the concept of experiment as it might confuse you. If you want to understand how centered world/credence/bayesian epistemology works (to then see that it DOES work), experiment isn't a good word, because it might lock you into a third-person view, where of course, centeredness does not work (of course, after you understood that bayesianism CAN work, we can reintroduce the word with some nuance). 
 

Your statistical analysis is of course also assumes the third-person/not centered view, so of course it won't help you, but again, we should first talk about whether centeredness CAN work or not. Assuming that it can't and deriving stuff from that does not prove that it can't work.

So no, I do not do this mistake in the text. This is the correct way to talk about Sleeping Beauty. Event "The Beauty is awaken in this experement" is properly defined. Event "The Beauty is awake on this particular day" is not, unless you find some new clever way to do it - feel free to try.

The clever way isn't that clever to be honest. It's literally just: don't assume that it does not work and try it.

If everything actually worked then the situation would be quite different. However, my previous post explores how every attempt to model the Sleeping Beauty problem, based on the framework of centred possible worlds fail one way or another. 

I've read the relevant part of your previous post and I have an idea that might help.

Consider the following problem: "Forgetful Brandon": Adam flips a coin and does NOT show it to Brandon, but shouts YAY! with 50% probability if the coin is HEADS (he does not shout if the coin is TAILS). (Brandon knows Adam's behaviour). However, Brandon is forgetful and if Adam doesn't shout he doesn't do any Bayesian calculation and goes off to have an icecream instead.

Adam doesn't shout. What should Brandon's credence of HEADS be after this?

I hope you agree that Brandon not actually doing the Bayesian calculation is irrelevant to the question. We should still do the Bayesian calculation if we are curious about the correct probability. Anytime Brandon updates he predictably updates in the direction of HEADS, but again: do we care about this? should we point out a failure of conservation of expected evidence? Again, I say: NO: What evidence is actually updated on in the thought experiment isn't relevant to the correct theoretical Bayesian calculation: we could also imagine a thought-experiment with a person who does bayesian calculations wrong every time, but to the correct credence that would still be irrelevant. If you agree, I don't see why you object to Sleeping Beauty not doing the calculation in case she is not awakened. (Which is the only objection you wrote under the "Freqency Argument" model)

EDIT: I see later you refer back to another post supposedly addressing a related argument, however, as that would be the fifth step of my recursion I will postpone inspecting it to tomorrow, but obviously, you can't give the same response to Forgetful Brandon as in this case Bradon does observe the non-shout, he just doesn't update on it. You also declare that P(Awake|Heads) to not be 1/2 and give "Beauty is awakened on Heads no matter what" as reason. You often do this mistake in the text, but here it's too important to not mention that "Awake" does not mean that "Beauty is awakened.", it means that "Beauty is awake" (don't forget that centeredness!) and, of course, Beauty is not awake if it is Tuesday and the coin is heads.

EDIT2: I'm also curious what you would say about the problem with the following modification ("Uninformed Sleeping Beauty"): Initially the full rules of the experiment are NOT explained to Beauty, only that she will have to sleep in the lab, she will get a drug on Monday night which will make her forget her day and that she may or may not be awakened on Monday/Tuesday. 

However, when she awakens the full rules are explained to her, ie that she will not get awakened on Tuesday if the coin is HEADS. 

Note that in this case you can't object that the prior distribution gives non-zero probability to Tuesday&Heads as Beauty unquestionably has 1/4 credence in that before they explain the full rules to her.

EDIT3: Missed that Beauty might think it's wednessday too in the previous case before being told the full rules, so let's consider instead the following ("Misinformed Sleeping Beauty"): Initially the full rules of the experiment are NOT explained to Beauty, only that she will have to sleep in the lab and that she will get a drug on Monday night which will make her forget her day. Furthermore, she is told the falsehood that she will be awakened on Monday AND Tuesday whatever happens!

However, when she awakens the full rules are explained to her, ie that she won't get/wouldn't have gotten awakened on Tuesday if the coin is HEADS. 

Note that in this case you can't object that the prior distribution gives non-zero probability to Tuesday&Heads as Beauty unquestionably has 1/4 credence in that before they explain the actual rules to her.

I wasn't sure either, but looked at the previous post to check which one is intended.

Load More