8y2

*What is Mathematics?* by Courant and Robbins is a classic exploration that goes reasonably deep into most areas of math.

This makes me think of two very different things.

One is informational containment, ie how to run an AGI in a simulated environment that reveals nothing about the system it's simulated on; this is a technical challenge, and if interpreted very strictly (via algorithmic complexity arguments about how improbable our universe is likely to be in something like a Solomonoff prior), is very constraining.

The other is futurological simulation; here I think the notion of simulation is pointing at a tool, but the idea of using this tool is a very small part of the ap...

Certainly, interventions may be available, just as for anything else; but it's not fundamentally more accessible or malleable than other things.

I'm arguing that the fuzzy-ish definition that corresponds to our everyday experience/usage is better than the crisp one that doesn't.

Re IQ and "way of thinking", I'm arguing they both affect each other, but neither is entirely under conscious control, so it's a bit of a moot point.

Apropos the original point, under my usual circumstances (not malnourished, hanging out with smart people, reading and thinking about engaging, complex things that can be analyzed and have reasonable success measures, etc), my IQ is mostly not under my control. (Perhaps if I was more focused on measurements, nootropics, and getting enough sleep, I could increase my IQ a bit; but not very much, I think.) YMMV.

8y10

I think what you're saying is that if we want a coherent, nontrivial definition of "under our control" then the most natural one is "everything that depends on the neural signals from your brain". But this definition, while relatively clean from the outside, doesn't correspond to what we ordinarily mean; for example, if you have a mental illness, this would suggest that "stop having that illness!!" is reasonable advice, because your illness is "under your control".

I don't know enough neuroscience to give this a physi...

0[anonymous]8y

Uh.. "stop having that illness!" is reasonable advice. Seek help. Try
medication. Enter into psychotherapy. I'm not sure what you are objecting to
there?

08y

Well, you're right that in the mental illness case my definition works badly,
but I can't think about a better precise definition right now (can you?);
probably something like selecting a specific "sub-process" in brain which is
related to the conscious experience, but it's fuzzy and I'm not even sure that
such separation is possible.
I have a feeling that it is a rephrasing of "things under your control".
Actually, I'm arguing that causal arrows are pointing in the opposite direction:
if I was to change your IQ, I could change your way of thinking. The rest of
article is about what happens if we assume IQ fixed (that somehow resembles
Bayesian inference).

29y

It clearly stipulates 12:01 am to avoid just this kind of confusion.
Further, the chapter will be posted at 10:00 am on Tuesday.
So the deadline is Monday night.

If you want to discuss the nature of reality using a similar lexicon to what philosophers use, I recommend consulting the Stanford Encyclopedia of Philosophy: http://plato.stanford.edu/

2[anonymous]9y

I have a very strong philosophical background. I've discussed many of those
topics with the authors.
Basically, what I'm trying to do is draw the attention to something that is
usually missed by people engaging in these topics.
That is: absolute is not objective.
There is a fundamental disconnect with the way most people organize truth and
reality.
They do no have clear concepts of objective and absolute. The sequence on how to
use words, is basically 6 parables that state words are not absolute. It's such
a simple point, but most people can look right at that sentence, and not have
the foggiest clue what it means.
Traditionally (in the history of philosophy) the Rationalist is the lone
defender of the distinction between objective and absolute.
I'm curious if that tradition is held up by contemporary rationalists.

Musk has joined the advisory board of FLI and CSER, which are younger sibling orgs of FHI and MIRI. He's aware of the AI xrisk community.

Cool. Regarding bounded utility functions, I didn't mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.

This is where I thought you were missing the point:

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

Sometimes we (seem to) ...

Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the ...

29y

No, it doesn't seem strange to me to consider representing what I want by a
bounded utility function. It seems strange to consider representing what I want
by a utility function that converges exponentially fast towards its bound.
I'll repeat something I said in another comment:
(Remark 1: the above is a comment that remarks that the optimum is the optimum
but is visibly not missing the point by failing to appreciate that we might be
constructing a utility function and trying to make it do good-looking things,
rather than approximating a utility function we already have.)
(Remark 2: I think I can imagine situations in which we might consider making
the relationship between chocolate and utility converge very fast -- in fact,
taking "chocolate" literally rather than metaphorically might yield such a
situation. But in those situations, I also think the results you get from your
exponentially-converging utility function aren't obviously unreasonable.)

09y

You still haven't answered my question of why we don't want those properties. To
me, they don't seem counter-intuitive at all.

One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf

I think you're making the wrong comparisons. If you buy $1 worth, you get p(win) * U(jackpot) + (1-p(win)) * U(-$1), which is more-or-less p(win)*U(jackpot)+U(-$1); this is a good idea if p(win) * U(jackpot) > -U(-$1). But under usual assumptions -U(-$2)>-2U(-$1). This adds up to normality; you shouldn't actually spend all your money. :)

011y

Of course you are right, silly mistake.
(Not really important nitpick:) The dollar is spent once the ticket is bought
and doesn't return even if you win, so you shoudn't have there (1-p(win)) *
U(-$1), but just U(-$1).

One good negation is "the value/intrinsic utility of a life is the sum of the values/intrinsic utilities of all the moments/experiences in it, evaluated without reference to their place/context in the life story, except inasmuch as is actually part of that moment/experience".

The "actually" gets traction if people's lives follow narratives that they don't realize as they're happening, but such that certain narratives are more valuable than others; this seems true.

12y5

If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.

112y

Add-on:
You can make the analogy clearer if you imagine, instead of rummaging around in
a hat, you lined up all the strips of paper in random order and read them one at
a time. Then it makes sense that the total number of slips of paper shouldn't
matter.

You can comfortably do Bayesian model comparison here; have priors for µcon, µamn, and µsim, and let µpat be either µamn (under hypothesis Hamn) or µsim (under hypothesis Hsim), and let Hamn and Hsim be mutually exclusive. Then integrating out µcon, µamn, and µsim, you get a marginal odds-ratio for Hamn vs Hsim, which tells you how to update.

The standard frequentist method being discussed is nested hypothesis testing, where you want to test null hypothesis H0 with alternative hypothesis H1, and H0 is supposed to be nested inside H1. For instance you could ...

"Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.

Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:

"Alice is a banker" is a simpler statement than "Alice is a feminist, a banker, or a pianist.". That's why the former must be assigned greater probability than the latter.

012y

Agreed. Instead of complexity, I should have probably said "specificity".
"Alice is a banker" is a less complicated statement than "Alice is a feminist, a
banker, or a pianist", but a more specific one.

12y2

tl;dr : miscalibration means mentally interpreting loglikelihood of data as being more or less than its actual loglikelihood; to infer it you need to assume/infer the Bayesian calculation that's being made/approximated. Easiest with distributions over finite sets (i.e. T/F or multiple-choice questions). Also, likelihood should be called evidence.

I wonder why I didn't respond to this when it was fresh. Anyway, I was running into this same difficulty last summer when attempting to write software to give friendly outputs (like "calibration") to a bu...

The way I'd try to do this problem mentally would be:

Relative to the desired concentration of 55%, each unit of 40% is missing .15 units of alcohol, and each unit of 85% has .3 extra units of alcohol. .15:.3=1:2, so to balance these out we need (amount of 40%):(amount of 85%)=2:1, i.e. we need twice as much 40% as 85%. Since we're using 1kg of 40%, this means 0.5kg of 85%.

312y

That's clever! Changing your frame of reference is a useful tool - there are a
lot of problems which become simpler if you use measurements from a 'zero' that
you pick.

Nope: the odds ratio was (.847/(1-.847))/(.906/(1-.906)), which is indeed 57.5%, which could be rounded to 60%. If the starting probability was, say, 1%, rather than 90.6%, then translating the odds ratio statement to "60% as likely" would be legitimate, and approximately correct; probably the journalist learned to interpret odds ratios via examples like that. But when the probabilities are close to 1, it's more correct to say that the women/blacks were 60% *more* likely to *not* be referred.

213y

Hmmm. I would have said that white men were 60% as likely to not be referred.
(This is the first time I've seen the golden ratio show up in a discussion of
probability!)

It's just a vanilla (MH) MCMC sampler for (some convenient family of) distributions on polytopes; hopefully like this: http://cran.r-project.org/web/packages/limSolve/vignettes/xsample.pdf , but faster. It's motivated by a model for inferring network link traffic flows from counts of in- and out-bound traffic at each node; the solution space is a polytope, and we want to take advantage of previous observations to form a better prior. But for the approach to be feasible we first need to sample.

But this is not a long-term project, I think.

Currently I'm taking classes and working on a polytope sampler. I tend to be excited about Bayesian nonparametrics and consistent families of arbitrary-dimensional priors. I'm also excited about general-purpose MCMC-like approaches, but so far I haven't thought very hard about them.

013y

What is a polytope sampler? Link to work?

013y

It seems like you might want to check this guy's work out.

13y8

In undergrad I feared a feeling of locked-in-ness, and ditched my intention to do a PhD in math (which I think I could have done well in) partly for this reason, though it was also easier for me because I hadn't established close ties to a particular line of research, and because I had programming background. I worked a couple of years in programming, and now I'm back in school doing a PhD in stats, because I like probability spaces and because I wanted to do something more mathematical than (most) programming. I guess I picked stats over applied math partly out of the same worry about overspecialization; I think stats has a bigger wealth of better-integrated more widely applicable concepts/insights.

013y

I am curious: what do you plan to work on in stats?
I personally think more people should be working on efficient general sampling
methods for Bayesian stats, for reasons I have written about here:
http://goodmorningeconomics.wordpress.com/2010/11/16/the-promise-of-bayesian-statistics-pt-2/
.
Programming skills are very useful there. I am a programmer and one of my
hobbies is implementing bayes stats algorithms in the literature. Do let me know
if you come up with anything revolutionary.

Would you be surprised if the absolute value was bigger than 3^^^3? I'm guessing yes, very much so. So that's a reason not to use an improper prior.

If there's no better information about the problem, I sortof like using crazy things like Normal(0,1)*exp(Cauchy); that way you usually get reasonable smallish numbers, but you don't become shocked by huge or tiny numbers either. And it's proper.

113y

Let's say that you know the variable is a real number in [0,1], but nothing
else...

I wasn't trying to present a principled distinction, or trying to avoid bias. What I was saying isn't something I'm going to defend. The only reason I responded to your criticism of it was that I was annoyed by the nature of your objection. However, since now I know you thought I was trying to say more than I actually was, I will freely ignore your objection.

Do you have an instance of "I proactively do X" where you do not class it as reactive? Do you have an instance of "I wish to avoid Y" where you do not class it as specific? I don't like conversations about definitions. I was using these words to describe a hypothetical inner experience; I don't claim that they aren't fuzzy. You seem to be pointing at the fuzziness and saying that they're meaningless; I don't see why you'd want to do that.

413y

My point is that 1 and 2 above don't seem to differ fundamentally in either of
the two descriptors you used.
Conversations about definitions of words are not useful, but definitions of
concepts are necessary. I'm pointing at the fuzziness because it indicates to me
that the supposed distinction is not being made based on any principle, but
simply to rationalize a preexisting bias.

It seems to me that we mean different things by the words "reactive" (as opposed to proactive) and "specific". A weak attempt at a reductio: I proactively do X to avoid facing Y; I am thus reacting to my desire to avoid facing Y. And is Y general or specific? Y is the specific Y that I do X to avoid facing.

513y

1. A person doesn't want to have a baby, so she has an abortion to stop the
fetus from developing into one.
2. A person doesn't want to have a fetus, so she uses contraception to stop the
ovum and sperm from developing into one.
If 1 is reactive, then so is 2.
For a given fetus, there is a finite possibility space of all the persons into
which it could develop, taking into account different values of unknown future
parameters. The same can be said of any combination of sperm and ova; it's just
that the possibility space is larger. How would one derive a concept of
"specific" that discriminates between the fetus space and the sperm/ova space
without drawing an arbitrary line based on the size of the space?

Ah, yes indeedy true. I guess I was thinking of abstinence. So wrong distinction. More likely, then: abortion is done to a specific embryo who is thereby prevented from being, and it's done reactively; there's no question that when you have an abortion it's about deciding to kill this particular embryo. Contraceptive use on the other hand is nonspecific and proactive; it doesn't feel like "I discard these reproductive cells which would have become a person!", it feels like exerting prudent control over your life.

813y

Every time contraception is used, it prevents a specific multitude of "potential
humans" from existing. Sure, most of them would have been prevented from
existing by other factors, but contraception still actively contributes to that.
It's also done reactively, in that it's a reaction to someone's desire to have
sex with a lower risk of pregnancy. It may not feel the same way as abortion,
but that's just because it's easier for humans to value fetuses than sperm and
egg cells. Both abortion and contraception have specific and reactive
components, in principle.

I agree with your main point (that this is a stumbling block for some people), but there are others who will contend that A and part of B (namely the irreversible error) do apply to unwanted babies (usually, or on average), and that the reason why abortion is more evil than contraception is because it's an error of commission rather than omission.

913y

Killing adults is less reversible in the sense that if you kill comedian carlos
mencia, you can't get a new carlos mencia if you change your mind. In contrast,
babies are basically fungible.

913y

I think taking birth control precautions is pretty comission-y. Abstinence would
be the omission version of not having babies.

But I drink orange juice with pulp; then the fiber is no longer absent, though I guess it's reduced. The vitamins and minerals are still present, though, aren't they?

213y

Are you making this juice yourself by chucking a whole orange in the blender and
then drinking it?
In that case, you probably - I don't know - have enough fiber that it's not that
much different from just eating an orange, and fresh juices are said to be more
nutritious than bought anyway. (Admittedly, the people who say this are people
who own juicers, but that's probably beside the point.)
But if you're buying it from the store, then... no. It's still mostly just sugar
with a little bit of texture floating in it.
If you're not gulping it by the gallon daily I wouldn't worry about it, but it's
part of your healthy balanced breakfast - and not a huge part :)

113y

You still get an enormous amount of sugar, with or without the pulp.
Regarding the vitamins and minerals, my understanding is that you need a certain
amount of each of those to avoid various nasty and fatal diseases, and an amount
over a certain limit can be poisonous, but there isn't any real evidence that
anything in-between makes a difference. From what I understand, it also requires
a very extreme diet (by modern developed world standards) to develop provably
harmful micronutrient deficiencies.
(One exception might be vitamin D if the winters are especially dark and cold
where you live, but you won't get that one from fruit juice.)

Regarding the fruit juices, I agree that fruit-flavored mixtures of HFCS and other things generally aren't worth much, but aren't proper fruit juices usually nutritious? (I mean the kinds where the ingredients consist of fruit juices, perhaps water, and nothing else.)

613y

One orange is one or two servings of fruit... but a serving of orange juice is
four oranges.
You're getting all the sugar and calories of four oranges (4 - 8 servings of
fruit!) without any of the fiber.
Fruit juices aren't exactly the devil, but they're not especially nutritious
either.

013y

I like real juice, but (except for orange juice with pulp) I always water it
down. It tastes the same when compared to long-term memory (although not when
directly compared).

413y

Fruit juices are very bad. They concentrate the sugar content of a lot of fruits
into a small mass and volume. For instance apple juice is usually considerably
more sugary than Pepsi, with around 11-12 g/100g sugar content, and also with a
worse sugar profile, with 66% fructose, compared to HFCS's 55 percent as it is
commonly used in soft drinks (note: fructose is the worse sugar). Other fruit
juices are usually above 8% sugar too.

413y

They're still high in sugar relative to how much you are likely to consume, and
don't offer the fiber or unprocessed-ness of entire fruit. It would usually be
better to either eat a piece of fruit or drink water. (I ignore this advice
because I hate water, so when I thirst between meals I drink juice.)

13y14

Regarding investment, my suggestion (if you work in the US) is to open a basic (because it doesn't periodically charge you fees) E*TRADE account here. They will provide an interface for buying and selling shares of stocks and various other things (ETFs and such; I mention stocks and ETFs because those are the only things I've tried doing anything with). They will charge you $10 for every transaction you make, so unless you're going to be (or become) active/clever enough to make it worthwhile, it makes sense not to trade too frequently.

EDIT: These guys appe...

013y

Scottrade is another well known company that provides the same services. They
only charge $7 dollars per transaction (more more for penny stocks). I've had
very positive experience with them.
One thing to keep in mind is that doing stock trading will make your taxes more
complicated and more expensive to fill out.

13y27

I feel like it is useful to mention that because of efficient markets (which implies assets are "fairly priced") and the benefits of diversification (lower risk), it's almost always better to buy a low fee mutual fund than any particular stocks or bonds. In particular, Index Funds merely keep a portfolio which tracks a broad market index. These often have very low operating costs, so they are a pretty good way to invest. You can buy these as ETFs, or you can buy them through something like Vanguard.

13y28

This is right. But to put it much more generally, and as an exercise in seriously trying to bridge information gaps:

To buy stocks you need what is called a Brokerage account. The way a brokerage account works is that you give money to the Broker to invest for you. (Generally, you will do this by transferring it from an existing bank account.) This money generally gets put into a highly liquid account in your name, such as a money market fund. You can get your money back by instructing your broker to send it back to you.

When you want to buy stocks or other...

13y3

Echoing the others:

If we suppose these are 22 iid samples from a Poisson then the max likelihood estimate for the Poisson parameter is 0.82 (the sample mean). Simulating such draws from such a Poisson and looking at sample correlation between Jan 15-Feb 4 and Jan 16-Feb 5, the p-value is 0.1. And when testing Poisson-ness vs negative binomial clustering (with the same mean), the locally most powerful test uses statistic (x-1.32)^2, and gives a simulated p-value of 0.44.

What I don't like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn't know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they're given.

This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.

As far as I understand, agent 1 doesn't know that agent 2 knows A2, and agent 2 doesn't know that agent 1 knows A1. Instead, agent 1 knows that agent 2's state of knowledge is in J and agent 2 knows that agent 1's state of knowledge is in I. I'm a bit confused now about how this matches up with the meaning of Aumann's Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that's what the theorem requires, but currently I'm finding it hard to see how I and J being common...

014y

Then agent 1 knows that agent 2 knows one of the members of J that have non
empty intersection with I(w), and similar for for agent 2.
Presumably they have to tell each other which of their own partitions w is in,
right? ie, presumably SOME sort of information sharing happens about each
other's conclusions.
And, once that happens, seems like intersection I(w) and J(w) would be their
resultant common knowledge.
I'm confused still though what the "meet" operation is.
Unless... the idea is something like this: they exchange probabilities. Then
agent 1 reasons "J(w) is a member of J such that it both Intersects I(w) AND
would assign that particular probability. So then I can determine the subset of
I(w) that intersects with those" and determine a probability from there." And
similar for agent 2. Then they exchange probabilities again, and go through an
equivalent reasoning process to tighten the spaces a bit more... and the theorem
ensures that they'd end up converging on the same probabilities? (each time they
state unequal probabilities, they each learn more information and each one then
comes up with a set that's a strict subset of the one they were previously
considering, but each of their sets always contain the intersection of I(w) and
J(w))?

That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that's still not common knowledge, because agent 1 doesn't know that agent 2 knows A1 union A2.

I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual "everyone knows that everyone knows that ... " definition of common knowledge translates to I(J(I(J(I(J(...(w)...).

114y

Well, how is it not the intersection then?
ie, Agent 1 knows A1 and knows that Agent 2 knows A2
If they trust each other's rationality, then they both know that w must be in A1
and be in A2
So they both conclude it must be in intersection of A1 and A2, and they both
know that they both know this, etc etc...
Or am I missing the point?

Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don't see what's wrong with this.

014y

I suppose my post was poorly worded. Yes, in this case omega is the reference
set for possible world histories.
What I was referring to was the baseline of w as an accurate measure. It is a
normalizing reference, though not a set.

Nope; it's the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.

Alternately if instead of I and J you think about the sigma-algebras they generate (let's call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.

114y

Then... I'm having trouble seeing why I^J wouldn't very often converge on the
entire space.
ie, suppose a super simplification in which both agent 1 and agent 2 partition
the space into only two parts, agent 1 partitioning it into I = {A1, B1}, and
agent 2 partitioning into J = {A2, B2}
Suppose I(w) = A1 and J(w) = A2
Then, unless the two partitions are identical, wouldn't (I^J)(w) = the entire
space? or am I completely misreading? And thanks for taking the time to explain.

Right, that is a good piece. But I'm afraid I was unclear. (Sorry if I was.) I'm looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective "stationary" can be interpreted in two compatible ways: either I'm talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a p...

014y

Janos, I spent some days parsing your request and it's quite complex. Cosma
Shalizi's thesis and algorithm seem to address your problem in a frequentist
manner, but I can't yet work out any good Bayesian solution.

Each element of the set is characterized by a bunch of probabilities; for example there is p*_*01101, which is the probability that elements x*_*{i+1} through x*_*{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).

How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think o...

014y

One issue with say taking a normal distribution and letting the variance go to
infinity (which is the improper prior I normally use) is that the posterior
distribution distribution is going to have a finite mean, which may not be a
desired property of the resulting distribution.
You're right that there's no essential reason to relate things back to the
reals, I was just using that to illustrate the difficulty.
I was thinking about this a little over the last few days and it occurred to me
that one model for what you are discussing might actually be an infinite
graphical model. The infinite bi-directional sequence here are the values of
bernoulli-distributed random variables. Probably the most interesting case for
you would be a Markov-random field, as the stochastic 'patterns' you were
discussing may be described in terms of dependencies between random variables.
Here's three papers I read a little while back on the topic (and related to)
something called an Indian Buffet process:
(http://www.cs.utah.edu/~hal/docs/daume08ihfrm.pdf)
(http://cocosci.berkeley.edu/tom/papers/ibptr.pdf)
(http://www.cs.man.ac.uk/~mtitsias/papers/nips07.pdf)
These may not quite be what you are looking for since they deal with a bound on
the extent of the interactions, you probably want to think about probability
distributions of binary matrices with an infinite number of rows and columns
(which would correspond to an adjacency matrix over an infinite graph).

214y

Something about this discussion reminds me of a hilarious text:
The moral of this story seems to be, Assume priors over generators, not over
sequences. A noninformative prior over the reals will never learn that the digit
after 0100 is more likely to be 1, no matter how much data you feed it.

The purpose would be to predict regularities in a "language", e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can't think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I'm just trying to find something that I can imagine computing with.

It's true that there exists a bijection between this space and the real numbers, but it doesn't seem like a very natural one, though it does work (it's measurable, etc). I'll have to think about that one.

114y

What topology are you putting on this set?
I made the point about the real numbers because it shows that putting a
non-informative prior on the infinite bidirectional sequences should be at least
as hard as for the real numbers (which is non-trivial).
Usually a regularity is defined in terms of a particular computational model, so
if you picked Turing machines (or the variant that works with bidirectional
infinite tape, which is basically the same class as infinite tape in one
direction), then you could instead begin constructing your prior in terms of
Turing machines. I don't know if that helps any.

14y4

Since we're discussing (among other things) noninformative priors, I'd like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?

Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you'd only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it's easy to find a ...

114y

I suppose it depends what you want to do, first I would point out that the set
is in a bijection with the real numbers (think of two simple injections and then
use Cantor–Bernstein–Schroeder), so you can use any prior over the real numbers.
The fact that you want to look at infinite sequences of 0s and 1s seems to imply
that you are considering a specific type of problem that would demand a very
particular meaning of 'non-informative prior'. What I mean by that is that any
'noninformative prior' usually incorporates some kind of invariance: e.g. a
uniform prior on [0,1] for a Bernoulli distribution is invariant with respect to
the true value being anywhere in the interval.

0[anonymous]14y

Overcoming Bias. :-)

I am trying to understand the examples on that page, but they seem strange; shouldn't there be a model with parameters, and a prior distribution for those parameters? I don't understand the inferences. Can someone explain?

014y

Well, the first example is a model with a single parameter. Roughly speaking,
the Bayesian initially believes that the true model is either a Gaussian around
1, or a Gaussian around -1. The actual distribution is a mix of those two, so
the Bayesian has no chance of ever arriving at the truth (the prior for the
truth is zero), instead becoming over time more and more comically overconfident
in one of the initial preposterous beliefs.

I think you're confusing the act of receiving information/understanding about an experience with the experience itself.

Re: the joke example, I think that one would get tired of hearing a joke too many times, and that's what the dissection is equivalent to, because you keep hearing it in your head; but if you already get the joke, the dissection is not really adding to your understanding. If you didn't get the joke, you will probably receive a twinge of enjoyment at the moment when you finally do understand. If you don't understand a joke, I don't think you...

014y

I think you make an important distinction, but people sometimes act like gaining
understanding will result in a long-term reduction in some warm fuzzies for
them. They sometimes explicitly tell me they think this will happen. While I
think people may underestimate the net warm fuzzies resulting from learning
(i.e. they are biased), I'm confident that they are sometimes correct. The
difficult question is deciding what we should do about this.
Don't get me wrong, I'm still very committed to epistemic rationality and will
try to sell people on its many virtues/benefits.

414y

Indeed, my wife and I have practiced for well over a decade how to get optimum
endorphin release from casual contact. (For example, we've identified certain
spots we can apply hand pressure to on the other person that create a sensation
we call "recharging" -- a kind of relaxed energy.)

Interesting. My internal experience of programming is quite different; I don't see boxes and lines. Data structures for me are more like people who answer questions, although of course with no personality or voice; the voice is mine as I ask them a question, and they respond in a "written" form, i.e. with a silent indication. So the diagrams people like to draw for databases and such don't make direct sense to me per se; they're just a way of organizing written information.

I am finding it quite difficult to coherently and correctly describe such things; no part of this do I have any certainty of, except that I know I don't imagine black-and-white box diagrams.

014y

That is a good question for a statistician, and I am not a statistician.
One thing that leaps to mind, however, is two-boxing on Newcomb's Problem using
assumptions about the prior probability of box B containing $1,000,000. Some new
work using math that I don't begin to understand suggests that either response
to Newcomb's problem is defensible using Bayesian nets.
There could be more trivial cases, too, where a person inputs unreasonable prior
probabilities and uses cargo-cult statistics to support some assertion.
Also, it's struck me that a frequentist statistician might call most Bayesian
uses of the theorem "abuses."
I'm not sure those are really good examples, but I hope they're satisfying.

Is there a reason to think this problem is less amenable to being solved by complexity priors than other learning problems? / Might we build an unaligned agent competent enough to be problematic without solving problems similar to this one?