All of Oscar_Cunningham's Comments + Replies

I've only been on Mastodon a bit longer than the current Twitter immigrants, but as far as I know there's no norm against linking. But the server admins are all a bit stressed by the increased load. So I can understand why they'd be annoyed by any link that brought new users. I've been holding off on inviting new users to the instance I'm on, because the server is only just coping as it is.

3jefftk1mo
It sounds like that by other people linking without consent they received unwanted attention: "I realised that some people had cross-posted my Mastodon post into Twitter. ... I struggled to understand what I was feeling, or the word to describe it. I finally realised on Monday that the word I was looking for was "traumatic". In October I would have interacted regularly with perhaps a dozen people a week on Mastodon, across about 4 or 5 different servers. Suddenly having hundreds of people asking (or not) to join those conversations without having acclimatised themselves to the social norms felt like a violation, an assault.* I guess what I'm trying to understand is whether Mastodon has some norm of "don't bring attention to people's writing without their consent"?

Apologies for asking an object level question, but I probably have Covid and I'm in the UK which is about to experience a nasty heatwave. Do we have a Covid survival guide somewhere?

(EDIT: I lived lol)

Is there a way to alter the structure of a futarchy to make it follow a decision theory other than EDT?

Or is that still too closely tied to the explore-exploit paradigm?

Right. The setup for my problem is the same as the 'bernoulli bandit', but I only care about the information and not the reward. All I see on that page is about exploration-exploitation.

What's the term for statistical problems that are like exploration-exploitation, but without the exploitation? I tried searching for 'exploration' but that wasn't it.

In particular, suppose I have a bunch of machines which each succeed or fail independently with a probability that is fixed separately for each machine. And suppose I can pick machines to sample to see if they succeed or fail. How do I sample them if I want to become 99% certain that I've found the best machine, while using the fewest number of samples?

The difference with exploration-exploitat... (read more)

2MondSemmel6mo
From Algorithms to Live By, I vaguely recall the multi-armed bandit problem [https://en.wikipedia.org/wiki/Multi-armed_bandit]. Maybe that's what you're looking for? Or is that still too closely tied to the explore-exploit paradigm?

The problem is that this measures their amount of knowledge about the questions as well as their calibration.

My model would be as follows. For a fixed source of questions, each person has a distribution describing how much they know about the questions. It describes how likely it is that a given question is one they should say p on. Each person also has a calibration function f, such that when they should say p they instead say f(p). Then by assigning priors over the spaces of these distributions and calibration functions, and applying Bayes' rule we get a... (read more)

2Davidmanheim8mo
Agreed that the proposal combines knowledge with calibration, but your procedure doesn't actually seem implementable.

I was expecting a post about tetraethyllead.

You just have to carefully do the algebra to get an inductive argument. The fact that the last digit is 5 is used directly.

Suppose n is a number that ends in 5, and such that the last N digits stay the same when you square it. We want to prove that the last N+1 digits stay the same when you square n^2.

We can write n = m*10^N + p, where p has N digits, and so n^2 = m^2*10^(2N) + 2mp*10^N + p^2. Note that since 2p ends in 0, the term 2mp*10^N actually divides by 10^(N+1). Then since the two larger terms divide by 10^N+1, n^2 agrees with p^2 on its last N+1 d... (read more)

2Viliam9mo
Amazing!

If we start with 5 and start squaring we get 5, 25, 625, 390625, 152587890625.... Note how some of the end digits are staying the same each time. If we continue this process we get a number ...8212890625 which is a solution of x^2 = x. We get another solution by subtracting this from 1 to get ...1888119476.

4Viliam9mo
I may be missing something, but it is not obvious to me why the number of digits at the end that stay the same is necessarily always increasing. I mean, the last N digits of x^2 depend on the last N digits of x. But it requires an explanation why the last N+1 digits of x^2 would be determined by the last N digits of x. I can visualize a possibility that perhaps at certain place a digit in 5^(2^k) starts changing periodically. Why would that not be the case? Also, is there something special about 5, or is the same true for all numbers?

if not, then we'll essentially need another way to define determinant for projective modules because that's equivalent to defining an alternating map?

There's a lot of cases in mathematics where two notions can be stated in terms of each other, but it doesn't tell us which order to define things in.

The only other thought I have is that I have to use the fact that  is projective and finitely generated. This is equivalent to  being dualisable. So the definition is likely to use  somewhere.

I'm curious about this. I can see a reasonable way to define  in terms of sheaves of modules over : Over each connected component,  has some constant dimension , so we just let  be  over that component.

If we call this construction  then the construction I'm thinking of is . Note that  is locally -dimensional, so my construction is locally isomorphic to yours but globally twisted. It depends on  via more than just its... (read more)

2AlexMennen9mo
Oh right, I was picturingWbeing free on connected components when I suggested that. Silly me. Fis alternating ifF(f∘g)=det(g)F(f), right? So if we're willing to accept kludgy definitions of determinant in the process of definingΛW(V), then we're all set, and if not, then we'll essentially need another way to define determinant for projective modules because that's equivalent to defining an alternating map?

I agree that 'credence' and 'frequency' are different things. But round here the word 'probability' does refer to credence rather than frequency. This isn't a mistake; it's just the way we're using words.

1Daniel_Eth9mo
Okay, but I've also seen rationalists use point estimates for probability in a way that led them to mess up Bayes, and such that it would be clear if they recognized the probability was uncertain (e.g., I saw this a few times related to covid predictions). I feel like it's weird to use "frequency" for something that will only happen (or not happen) once, like whether the first AGI will lead to human extinction, though ultimately I don't really care what word people are using for which concept.

Another thing to say is if  then

.

My intuition for  is that it tells you how an infinitesimal change accumulates over finite time (think compound interest). So the above expression is equivalent to . Thus we should think 'If I perturb the identity matrix, then the amount by which the unit cube grows is proportional to the extent to which each vector is being stretched in the direction it was already pointing'.

2cousin_it9mo
Hmm, this seems wrong but fixable. Namely, exp(A) is close to (I+A/n)^n, so raising both sides of det(exp(A))=exp(tr(A)) to the power of 1/n gives something like what we want. Still a bit too algebraic though, I wonder if we can do better.

Thank you for that intuition into the trace! That also helps make sense of .

2cousin_it9mo
Interesting, can you give a simple geometric explanation?

This is close to one thing I've been thinking about myself. The determinant is well defined for endomorphisms on finitely-generated projective modules over any ring. But the 'top exterior power' definition doesn't work there because such things do not have a dimension. There are two ways I've seen for nevertheless defining the determinant.

  • View the module as a sheaf of modules over the spectrum of the ring. Then the dimension is constant on each connected component, so you can take the top exterior power on each and then glue them back together.
  • Use the fact
... (read more)
4AlexMennen9mo
I'm curious about this. I can see a reasonable way to defineΛW(V)in terms of sheaves of modules overSpec(R): Over each connected component,Whas some constant dimensionn, so we just letΛW(V)beΛn(V)over that component. But it sounds like you might not like this definition, and I'd be interested to know if you had a better way of definingΛW(V)(which will probably end up being equivalent to this). [Edit: Perhaps something in terms of generators and relations, with the generators being linear mapsW→V?]

I think the determinant is more mathematically fundamental than the concept of volume. It just seems the other way around because we use volumes in every day life.

3Ege Erdil9mo
I think the good abstract way to think about the determinant is in terms of induced maps on the top exterior power. If you have an n dimensional vector space V and an endomorphism L:V→V, this induces a map ∧nV→∧nV, and since ∧nV is always one-dimensional this map must be of the form v→kv for some scalar k in the ground field. It's this k that is the determinant of L. This is indeed more fundamental than the concept of volume. We can interpret exterior powers as corresponding to volume if we're working over a local field, for example, but actually the concept of exterior power generalizes far beyond this special case. This is why the determinant still preserves its nice properties even if we work over an arbitrary commtuative ring; since such rings still have exterior powers behaving in the usual way. I didn't present it like this in this post because it's actually not too easy to introduce the concept of "exterior power" without the post becoming too abstract.

https://www.lesswrong.com/posts/AAqTP6Q5aeWnoAYr4/?commentId=WJ5hegYjp98C4hcRt

I don't dispute what you say. I just suggest that the confusing term "in the worst case" be replaced by the more accurate phrase "supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated 'random'".

In this case P is the cumulative distribution function, so it has to approach 1 at infinity, rather than the area under the curve being 1. An example would be 1/(1+exp(-x)).

3Robert Kennedy9mo
Actually, for any given P which works, P'(x)=P(x)/10 is also a valid algorithm.

A simple way to do this is for ROB to output the pair of integers {n, n+1} with probability K((K-1)/(K+1))^|n|, where K is some large number. Then even if you know ROB's strategy the best probability you have of winning is 1/2 + 1/(2K).

If you sample an event N times the variance in your estimate of its probability is about 1/N. So if we pick K >> √N then our probability of success will be statistically indistinguishable from 1/2.

The only difficulty is implementing code to sample from a geometric distribution with a parameter so close to 1.

The original problem doesn't say that ROB has access to your algorithm, or that ROB wants you to lose.

1Tapatakt9mo
In such problems, it is usually assumed that your solution have to work (in this case work = better than 50% accuracy) always, even in the worst case, when all unknowns are against you.
4dxu9mo
From near the end of the post you linked: In the case of Robert's problem, since ROB is specified to have access to your algorithm, it effectively "moves second", which does indeed place it in a position to "use its intelligence on you". So it should be unsurprising that a mixed strategy can beat out a pure strategy. Indeed, this is why mixed strategies are sometimes optimal in game-theoretic situations: when adversaries are involved, being unpredictable has its advantages. Of course, this does presume that you have access to a source of randomness that even ROB cannot predict. If you do not have access to such a source, then you are in fact screwed. But then the problem becomes inherently unfair, and thus of less interest. (From a certain perspective, even this is compatible with Eliezer's main message: for an adversary to be looking over your shoulder, "moving second" relative to you, and anti-optimizing your objective function, is indeed a special case of things being "worse than random". The problem is that producing a "superior derandomized algorithm" in such a case requires inverting the "move order" of yourself and your adversary, which is not possible in many scenarios.)

Right, I should have chosen a more Bayesian way to say it, like 'suceeds with probability greater than 1/2’.

 The intended answer for this problem is the Frequentist Heresy in which ROB's decisions are treated as nonrandom even though they are unknown, while the output of our own RNG is treated as random, even though we know exactly what it is, because it was the output of some 'random process'.

Instead, use the Bayesian method. Let P({a,b}) be your prior for ROB's choice of numbers. Let x be the number TABI gives you. Compute P({a,b}|x) using Bayes' Theorem. From this you can calculate P(x=max(a,b)|x). Say that you have the highest number if this is over 1/2

... (read more)
1Bojadła9mo

Some related discussions: 1. https://www.conwaylife.com/forums/viewtopic.php?f=2&t=979 2. https://www.conwaylife.com/forums/viewtopic.php?f=7&t=2877 3. https://www.conwaylife.com/forums/viewtopic.php?p=86140#p86140

My own thoughts.

  • Patterns in GoL are generally not robust. Typically changing anything will cause the whole pattern to disintegrate in a catastrophic explosion and revert to the usual 'ash' of randomly placed small still lifes and oscillators along with some escaping gliders.

  • The pattern Eater 2 can eat gliders along 4 adjacent lanes.

... (read more)

Yes, I'm a big fan of the Entropic Uncertainty Principle. One thing to note about it is that the definition of entropy only uses the measure space structure of the reals, whereas the definition of variance also uses the metric on the reals as well. So Heisenberg's principle uses more structure to say less stuff. And it's not like the extra structure is merely redundant either. You can say useful stuff using the metric structure, like Hardy's Uncertainty Principle. So Heisenberg's version is taking useful information and then just throwing it away.

I'd almos... (read more)

A good source for the technology available in the Game of Life is the draft of Nathaniel Johnston and Dave Greene's new book "Conway’s Game of Life: Mathematics and Construction".

3Ramana Kumar1y
Thanks! I'd had a bit of a look through that book before and agree it's a great resource. One thing I wasn't able to easily find is examples of robust patterns. Does anyone know if there's been much investigation of robustness in the Life community? The focus I've seen seems to be more on particular constructions (used in its entirety as the initial state for a computation), rather than on how patterns fare when placed in various ranges of different contexts.

The if the probabilities of catching COVID on two occasions are x and y, the the probability of catching it at least once is 1 - (1 - x)(1 - y) which equals x + y - xy. So if x and y are large enough for xy to be significant, then splitting is better because even though catching it the second time will increase your viral load, it's not going to make it twice as bad as it already was.

The link still works for me. Perhaps you must first become a member of that discord? Invite link: https://discord.gg/nZ9JV5Be (valid for 7 days)

1itaibn02y
Thanks. I also found an invite link in a recent reddit post about this discussion (was that by you?).

The weird thing is that there are two metrics involved: information can propagate through a nonempty universe at 1 cell per generation in the sense of the l_infinity metric, but it can only propagate into empty space at 1/2 a cell per generation in the sense of the l_1 metric.

https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm

You're probably right, but I can think of the following points.

Its rule is more complicated than Life's, so its worse as an example of emergent complexity from simple rules (which was Conway's original motivation).

It's also a harder location to demonstrate self replication. Any self replicator in Critters would have to be fed with some food source.

Yeah, although probably you'd want to include a 'buffer' at the edge of the region to protect the entity from gliders thrown out from the surroundings. A 1,000,000 cell thick border filled randomly with blocks at 0.1% density would do the job.

2paulfchristiano2y
That seems great. Is there any reason people talk a lot about Life instead of Critters? (Seems like Critters also supports universal computers and many other kinds of machines. Are there any respects in which it is known to be less rich than Life?)

This is very much a heuristic, but good enough in this case.

Suppose we want to know how many times we expect to see a pattern with n cells in a random field of area A. Ignoring edge effects, there are A different offsets at which the pattern could appear. Each of these has a 1/2^n chance of being the pattern. So we expect at least one copy of the pattern if n < log_2(A).

In this case the area is (10^60)^2, so we expect patterns of size up to 398.631. In other words, we expect the ash to contain any pattern you can fit in a 20 by 20 box.

8Alex Flint2y
So just to connect this back to your original point: if we knew that it were possible to construct some kind of intelligent entity in a region with area of, say, 1,000,000,000 cells, then if our overall grid had 21,000,000,000 total cells and we initialized it at random, then we would expect an intelligent entity to pop up by chance at least once in the whole grid.

The glider moves at c/4 diagonally, while the c/2 ships move horizontally. A c/2 ship moving right and then down will reach its destination at the same time the c/4 glider does. In fact, gliders travel at the empty space speed limit.

2AprilSR2y
Huh. Something about the way speed is calculated feels unintuitive to me, then.

Most glider guns in random ash will immediately be destroyed by the chaos they cause. Those that don't will eventually reach an eater which will neutralise them. But yes, such things could pose a nasty surprise for any AI trying to clean up the ash. When it removes the eater it will suddenly have a glider stream coming towards it! But this doesn't prove it's impossible to clear up the ash.

Making a 'ship sensor' is tricky. If it collides with something unexpected it will create more chaos that you'll have to clear up.

This sounds like you're treating the area as empty space, whereas the OP specifies that it's filled randomly outside the area where our AI starts.

3gwern2y
OP said I can initialize a large chunk as I like (which I initialize to be empty aside from my constructors to avoid interfering with placing the pixels), and then the rest might be randomly or arbitrarily initialized, which is why I brought up the wall of still-life eaters to seal yourself off from anything that might then disrupt it. If his specific values don't give me enough space, but larger values do, then that's an answer to the general question as nothing hinges on the specific values.

My understanding was that we just want to succeed with high probability. The vast majority of configurations will not contain enemy AIs.

3Alex Flint2y
Yeah success with high probability was how I thought the question would need to be amended to deal with the case of multiple AIs. I mentioned this in the appendix but should have put it in the main text. Will add a note.

See here https://conwaylife.com/forums/viewtopic.php?f=7&t=1234&sid=90a05fcce0f1573af805ab90e7aebdf1 and here https://discord.com/channels/357922255553953794/370570978188591105/834767056883941406 for discussion of this topic by Life hobbyists who have a good knowledge of what's possible and not in Life.

What we agree on is that the large random region will quickly settle down into a field of 'ash': small stable or oscillating patterns arranged at random. We wouldn't expect any competitior AIs to form in this region since an area of 10^120 will only ... (read more)

4JenniferRM1y
I performed the same experiment with glider guns and got the same result "stalagmite of ash" result. I didn't use that name for it, but I instantly recognized my result under that description <3 When I performed that experiment, I was relatively naive about physics and turing machines and so on, and sort of didn't have the "more dakka [https://www.lesswrong.com/posts/z8usYeKX7dtTWsEnk/more-dakka]" gut feeling that you can always try crazier things once you have granted that you're doing math, and so infinities are as nothing to your hypothetical planning limits. Applying that level of effort to Conway's Life... something that might be interesting would be to play with 2^N glider guns in parallel, with variations in their offsets, periodicities, and glider types, for progressively larger values of N? Somewhere in all the variety it might be possible to generate a "cleaning ray"? If fully general cleaning rays are impossible, that would also be an interesting result! (Also, it is the result I expect.) My current hunch is that a "cleaning ray with a program" (that is allowed to be fed some kind of setup information full of cheat codes about the details of the finite ash that it is aimed at) might be possible. Then I would expect there to be a lurking result where there was some kind of Maxwell's Demon style argument [https://physicstoday.scitation.org/doi/10.1063/PT.3.2912] about how many bits of cheatcode are necessary to clean up how much random ash... and then you're back to another result confirming the second law of thermodynamics, but now with greater generality about a more abstract physics? I haven't done any of this, but that's what my hunch is.
1itaibn02y
Thanks for linking to my post! I checked the other link, on Discord, and for some reason it's not working.
7Raphaël S2y
Could you just explain a bit "will only be likely to contain arbitrary patterns of sizes up to log(10^120)" please ? Or give some pointers with other usage of such calculation ?
3Alex Flint2y
Interesting! Thank you for these links
1Measure2y
At a large enough scale the ash field would probably contain glider guns that would restart the chaos and expand the ash frontier.

Another good one is the spell 'Assume for contradiction!', which when you are trying to prove p gives you the lemma ¬p.

The rule in modal logic is that we can get ⊢□p from ⊢p, not that we can get □p from p.

True:

If PA proves p, then PA proves that PA proves p.

False:

If p, then PA proves p.

EDIT: Maybe it would clarify to say that '⊢p' and '□p' both mean 'PA (or whichever theory) can prove p', but '⊢' is used when talking about PA, whereas '□' is used when talking within PA.

From our vantage point of ZFC, we can see that PA is in fact consistent. But we know that PA can't prove its own consistency or inconsistency. So the classic example of a system which is consistent but unsound is PA + ¬Con(PA). This system is consistent since deriving a contradiction in it would amount to a proof by contradiction of consistency in PA, which we know is impossible. But it's unsound since it falsely believes that PA is not consistent.

Your proof of 'consistency → soundness' goes wrong in the following way:

Suppose no soundness: ¬(□p→p); then

... (read more)
1Adrià Garriga-alonso2y
So it was that necessitation is outside of PA. Thank you! It seems to be assumed in modal logic [https://en.wikipedia.org/wiki/L%C3%B6b's_theorem#Modal_rules_of_inference] (or at least a relevant modal logic) though. Which would imply that that modal logic assumes it is sound. It's still a bit weird to me that Löb's theorem is also true in that case.

You can't write down '∀p: Provable(p)→p' in PA, because in order to quantify over sentences we have to encode them as numbers (Gödel numbering).

We do have a formula Provable, such that when you substitute in the Gödel number p you get a sentence Provable(p) which is true if and only if the sentence p represents is provable. But we don't have a formula True, such that True(p) is true if and only if the sentence p represents is true. So the unadorned p in '∀p: Provable(p)→p' isn't possible. No such formula is possible since otherwise you could use diagonaliz... (read more)

4Gurkenglas2y
Then apparently "PA can't prove its own soundness." is an even weaker true statement among the ones one might choose to remember :).

I think you've got one thing wrong. The statement isn't consistency, it's a version of soundness. Consistency says that you can't prove a contradiction, in symbols simply . Whereas soundness is the stronger property that the things you prove are actually true, in symbols . Of course first order logic can't quantify over sentences, so you can't even ask the question of whether PA can prove itself sound. But what you can ask is whether PA can prove itself sound for particular statements, i.e. whether it can prove for some s.

What Löb's t... (read more)

1Adrià Garriga-alonso2y
I think you're right, the statement is intuitively soundness, but I'm still confused. I would be very grateful if you could clear it up. Is "soundness → consistency" what you mean by stronger? If we can assume necessitation, soundness and consistency are equivalent, proof below. Does that mean we cannot assume the necessitation proof rule [https://en.wikipedia.org/wiki/Normal_modal_logic], that is if ⊢p, add ⊢□p? This would be consistent with "PA cannot prove its own completeness", the necessitation rule would imply that completeness is true. First, we prove that consistency implies soundness by modus tollens. Suppose no soundness: ¬(□p→p); then □p∧¬p. From ¬p, by necessitation □¬p, and so overall □( p∧¬p), that is □⊥. We have proven ¬(□p→p)→□⊥, by modus tollens ¬□⊥→(□p→p). Second, we can prove that soundness implies consistency without necessitation, by modus tollens and contradiction. Suppose □⊥. Further supposing □p→p clearly proves ⊥, so we reject the assumption, thus ¬(□p→p). Thus we prove □⊥→¬(□p→p), by modus tollens (□p→p)→¬□⊥. Where did I go wrong? Is it the necessitation deduction rule? Is it that the necessitation deduction rule is valid, but cannot be used within a statement of PA?

Tails can alternate between fat and thin as you go further out. If heights were normally distributed with the same mean and variance then there would be fewer people above 7ft than there are now, but the tallest man would be taller than the tallest man now.

North Korea were caught cheating in 1991 and given a 15 year ban until 2007. They were also disqualified from the 2010 IMO because of weaker evidence of cheating. Given this, an alternative hypothesis is that they have also been cheating in other years and weren't caught. The adult team leaders at the IMO do know the problems in advance, so cheating is not too hard.

1Oskar Mathiasen2y
The Danish team leader (the closest source to these events i have talked to) seems to (personally) believe the cheating allegations in 2010 or at least that the evidence was insufficient. Also note their non participation in 2017 and 2018 for reasons not known to me and in 2020 likely because the event was online

One other argument I've seen for Kelly is that it's optimal if you start with $a and you want to get to $b as quickly as possible, in the limit of b >> a. (And your utility function is linear in time, i.e. -t.)

You can see why this would lead to Kelly. All good strategies in this game will have somewhat exponential growth of money, so the time taken will be proportional to the logarithm of b/a.

So this is a way in which a logarithmic utility might arise as an instrumental value while optimising for some other goal, albeit not a particularly realistic one.

Load More