samshap

Predictive Coding has been Unified with Backpropagation

Kind of. Neuromorphics don't buy you too much benefit for generic feedforward networks, but they dramatically reduce the expenses of convergence. Since the 100x in this paper derives from iterating until the network converges, a neuromorphics implementation (say on Loihi) would directly eliminate that cost.

Predictive Coding has been Unified with Backpropagation

TLDR for this paper: There is a separate set of 'error' neurons that communicate backwards. Their values converge on the appropriate back propagation terms.

A large error at the top levels corresponds to 'surprise', while a large error at the lower levels corresponds more to the 'override'.

Predictive Coding has been Unified with Backpropagation

I think that's premature. This is just one (digital, synchronous) implementation of one model of BNN that can be shown to converge on the same result as backprop. In a neuromorphic implementation of this circuit, the convergence would occur on the same time scale as the forward propagation.

Predictive Coding has been Unified with Backpropagation

Right side of equation 2. Also the v update step in algorithm 1 should have a negative sign (the text version earlier on the same page has it right).

Predictive Coding has been Unified with Backpropagation

Thanks for sharing!

Two comments:

- There seem to be a couple of sign errors in the manuscript. (Probably worth reaching out to the authors directly)
- Their predictive coding algorithm holds the vhat values fixed during convergence, which actually implies a somewhat different network topology than the more traditional one shown in your figure.

samshap's Shortform

Do you have some source for saying the log scoring rule should only be used when no anthropics are involved? Without that, what does it even mean to have a well-calibrated belief?

(BTW, there are other nice features of using the log-scoring rule, such as rewarding models that minimize their cross-entropy with the territory).

samshap's Shortform

My argument is that the log scoring rule is not just a "*given way of measuring outcomes*". A belief that maximizes E(log(p)) is the definition of a proper Bayesian belief. There's no appeal to consequence other than "SB's beliefs are well calibrated".

samshap's Shortform# Redissolving sleeping beauty (and maybe solving it entirely)

[epistemic status - I'm new to thinking about anthropics, but I don't see any obvious flaws]

If a tree falls on sleeping beauty famously claims to have dissolved the Sleeping Beauty problem - that SB's correct answer just depended on what the reward structure for her answers, and that her actual credance didn't matter.

Several lesswrongers seem unsatisfied with that answer - understandably, given a longstanding commitment to epistemics and Bayesianism!

I would argue that ata did some key work in answering the problem from a purely epistemic perspective.

Recall the question SB is to be asked upon waking:

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”

And one of the bets ata formulated:

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”, and the answer given will be scored according to alogarithmic scoring rule, with the aggregate result corresponding to the number of utilons (converted to dollars, let’s say) she will be penalized after the experiment.

These questions are actually equivalent! A properly calibrated belief is one that is optimal w.r.t to the logarithmic scoring rule.

ata goes on to show that the answer to that question is 1/3. This result, I think, is actually contingent on the meaning of 'aggregate'. **If 'aggregate' just means 'sum over all predictions ever', then ata's math checks out, the thirders are right, and the problem is solved.**

However, given the premise of SB - in case of tails, she forgets everything that happened on Monday - you could argue for 'aggregate' meaning 'sum over all predictions she remembers making', in which case the correct answer is one half. Or if we include the log score for predictions that she was told she made, (say because the interviewers wrote it down and told her afterwards), then the answer becomes 1/3 again!

**So the SB paradox boils down to what you, as an epistemic rationalist, consider the correct way to aggregate the entropy of predictions!**

The 'sum over all predictions' seems best to me (and thus I suppose I lean to the 1/3 answer), but I don't have a definitive reason as to why.

Defending the non-central fallacy

Suppose that you want to move to Hawaii because it's so beautiful, but you know (because you saw something on the internet) that upon arrival, someone will rob you. If knowing this information, you

stillmove to Hawaii, does this mean that you are consenting to being robbed?Evenif when you actually get to Hawaii, you make sure to explain to every potential robber that youreally reallydon't want to be robbed?

Your argument here is both circular, and committing the noncentral fallacy!

To recap:

In a debate with rohimshah over whether taxation can be consensual (and therefore theft),your argument reads:

- Taxation is analogous to robbery
- Robbery (even robbery that predictably occurs when I consume a good or service) is not consensual
- Therefore, taxation (even taxation that predictably occurs when I consume a good or service) is not consensual
- Therefore taxation is theft

I won't ding your OP for assuming that taxation is nonconsensual, since you were merely responding to Scott's arguments that had already conceded that point.

However, to argue that all taxes are always nonconsensual is clearly absurd.

Many taxes (especially local ones) are nearly identical to fees that private actors charge under similar terms (e.g. property taxes are equivalent to HOA fees and rents). Not to mention plenty of times when people explicitly consent to taxation!

If you want to strengthen your argument, limit it to: 'nonconsensual taxation is theft'.

Incorrect. Perceptrons are a low fidelity (but still incredibly useful!) rate-encoded model of individual neurons.