rossry

rossry's Posts

Sorted by New

rossry's Comments

Bayesian examination

However, the post assumes that 1) there is (or should be) one correct answer, 2) which is of the form: (1, 0, 0, 0) or a permutation thereof, and 3) the material is independent of the system (does not include probability, for example).

These are assumed for the sake of explanation, but none are necessary; in fact, the scoring rule and analysis go through verbatim if you have questions with multiple answers in the form of arbitrary vectors of numbers, even if they have randomness. The correct choice is still to guess, for each potential answer, your expectation of that answer's realized result.

Open & Welcome Thread - November 2019

just because "I don't want to see more of this" doesn't mean it's up to me to influence whether anyone else can see it.

I feel like this proves more than you want. For example, is it up to you to influence whether someone sees more of something, just because you want to see more of it?

Similarly, it's also helpful to get a reason for up votes, but enforcing that a reason be given can reduce the amount of information-aggregation that will occur, on some margins. What justifies an asymmetry between how we aggregate positive information and how we aggregate negative information? Or would you also argue that up votes should come with reasons?

Open & Welcome Thread - November 2019

I mean a weighted sum where weights add to unity.

An optimal stopping paradox

You need an exponentially increasing reward for your argument to go through. In particular, this doesn't prove enough:

Since at each moment in time, you face the exact same problem (linearly increasing reward, α-exponentially decaying survival rate)

The problem isn't exactly the same, because the ratio of (linear) growth rate to current value is decreasing over time. At some point, the value equals (is the right expression, I think?), and your marginal value of waiting is 0 (and decreasing), and you sell.

If the ratio of growth rate to current value is constant over time, then you're in the same position at each step, but then it's either the St. Petersburg paradox or worthless.

Open & Welcome Thread - November 2019

Sorry, I'm writing pretty informally here. I'm pretty sure that there are senses in which these arguments can be made formal, though I'm not really interested in going through that here, mostly because I don't think formality wins us anything interesting here.

Some notes, though: (still in a fairly informal mode)

My intuition that the only way to combine the two estimates without introducing a bias or assumed prior is by a mixture comes from treating each estimate (treated as a random variable) as a true estimate plus some idiosyncratic noise. Then any function of them yields an expression in terms of true estimate, each respective estimator's noise, and maybe other constants. But "unbiased" implies that setting the noise terms to 0 should set the expression equal to the true estimate (in expectation). Without making assumptions about the actual distribution of true values, this needs to just be 1 times the true estimate (plusmaybe some other noise you don't want, which I think you can get rid of). And the only way you get there from the noisy estimates is a mixture.

By "assembly", I'm proposing to treat each estimate as a larger number of estimates with the same mean and larger variance, such that they form equivalent evidence. Intuitively, this works out if the count goes as the square of the variance ratio. Then I claim that the natural thing to do with many estimates each of the same variance is to take a straight average.

But they're distributions, not observations.

Sure, formally each observer's posterior is a distribution. But if you treat "observer 1's posterior is Normally distributed, with mean and standard deviation " as an observation you make as a Bayesian (who trusts observer 1's estimation and calibration), it gets you there.

Open & Welcome Thread - November 2019

Ah, okay. In that case, here are a few attempts to ground the idea philosophically:

  1. It's the "prior-free" estimate with the least error. See that unbiased "prior-free" estimates must be mixtures of the (unbiased) estimates, and that biased estimates are dominated by being scaled to fit. So the best you can do is to pick the mixture that minimizes variance, which this is.

  2. It actually is the point that maximizes the product of likelihoods (equivalently, the joint likelihood, since the estimate errors are assumed to be independent). You can see this by remembering that the Normal pdf is the inverse exponential quadratic, so you maximize the product of likelihoods by maximizing the sum of log-likelihoods, which happens where the log-likelihood slopes are each the negative of the other, which happens when distances are inversely proportional to the x^2 coefficients (or the weights are inversely proportional to the variances).

  3. There's a pseudo-frequentist(?) version of this, where you treat each estimate as an assembly of (higher-variance) estimates at the same point, notice that the count is inversely proportional to the variance, and take the total population mean as your estimator. (You might like the mean for its L2-minimizing properties.)

  4. A Bayesian interpretation is that, given the improper prior uniformly distributed over numbers and treating the two as independent pieces of evidence, the given formula gives the mode of the posterior (and, since the posterior is Normal, gives its mean and median as well).

Are any of those compelling?

Open & Welcome Thread - November 2019

Are you asking for a justification for averaging independent estimates to achieve an estimate with lower errors? "Blended estimate" isn't a specific term of art, but the general idea here is so common that I'm not sure _what_ the most common term for it is.

And the theoretical justification -- under assumptions of independent and Normal errors -- is at the post, where the author demonstrates that there's a lower error from the weighted average (and that their choice of weights minimizes the error). Am I missing something here?

AlphaStar: Impressive for RL progress, not for AGI progress

Arimaa is the(?) classic example of a chess-like board game that was designed to be hard for AI (albeit from an age before "AI" mostly meant ML).

From David Wu's paper on the bot that finally beat top humans in 2015:

Why is Arimaa computer-resistant? We can identify two major obstacles.

The first is that in Arimaa, the per-turn branching factor is extremely large due to the combinatorial possibilities produced by having four steps per turn. Even after identifying equivalent permutations of steps as the same move, on average there are about 17000 legal moves per turn (Haskin, 2006). This is a serious impediment to search.

Obviously, a high branching factor alone doesn’t imply computer-resistance, particularly if the standard of comparison is with human play: high branching factors affect humans as well. However, Arimaa has a property common to many computer-resistant games: that “per amount of branching” the board changes slowly. Indeed, pieces move only one orthogonal step at a time. This makes it possible to effectively plan ahead, cache evaluations of local positions, and visualize patterns of good moves, all things that usually favor human players.

The second obstacle is that Arimaa is frequently quite positional or strategic, as opposed to tactical. Capturing or trading pieces is somewhat more difficult in Arimaa than in, for example, Chess. Moreover, since the elephant cannot be pushed or pulled and can defend any trap, deadlocks between defending elephants are common, giving rise to positions sparse in easy tactical landmarks. Progress in such positions requires good long-term judgement and strategic understanding to guide the gradual maneuvering of pieces, posing a challenge for positional evaluation.

When is pair-programming superior to regular programming?

It's easy to play armchair statistician and contribute little, but I want to point out that the empirics cited here are effectively just anecdotes. The paper studies 13 pairs and 13 individuals in three assignments in one class at UUtah. Its estimate of relative time costs is only significant to ~ because development time has variance of (if I backsolved correctly) 65%, which...seems about right. Still, it seems like borderline abuse of frequentist statistics to argue that a two-tailed p<0.05 should be required to reject the hypothesis that pairs finish projects in half the wall-clock time of individuals (which is the null the analysis assumes).

That said, the author correctly identifies that quality matters significantly more than speed. The quality metric, however, is "assignment tests passed" in throwaway academic projects, eliding the questions of what quality failures would or wouldn't be caught by the review / CI workflows that an industrial project would be going through anyway.

So, finger to the wind, this study feels like it suggests that a pair spends 15% more person-hours (once they get used to each other) before turning their schoolwork in, and do 15% more of the work of the assignment than a student working alone. Consistent with the higher reported work-enjoyment numbers! Definitely a stronger showing than I would have guessed! But definitely not well-abstracted by "no significant result for time; significant improvement for quality".

What am I missing here?

Eukryt Wrts Blg

(continued, to address a different point)

B and C seem like arguments against "simple" (i.e., even-odds) bets as well as weird (e.g., "70% probability") bets, except for C's "like bets where I'm surer...about what's going on", which is addressed by A (sibling comment).

Your point about differences in wealth causing different people to have different thresholds for meaningfulness is valid, though I've found that it matters much less than you'd expect in practice. It turns out that people making upwards of $100k/yr still do not feel good about opening up their wallet you give you $3. In fact, it feels so bad that if you do it more than a few times in a row, you really feel the need to examine your own calibration, which is exactly the success condition.

I've found that the small ritual of exchanging pieces of paper just carries significantly more weight than would be implied by their relation to my total savings. (For this, it's surprisingly important to exchange actual pieces of paper; electronic payments make the whole thing less real, ruining the whole point.)

Finally, it's hard to argue with someone's utility function, but I think that some rationalists get this one badly wrong by failing to actually multiply real numbers. For example, if you make a $10 bet (as defined in my sibling comment) every day for a year at the true probabilities, the standard decision of your profit/loss on the year is <$200, or $200/365 per day, which seems like a very small annual cost to practice being better calibrated and evaluate just how well-calibrated you are.

Load More