Sufficiently Advanced Language Models Can Do Reinforcement Learning

As I've argued previously, a natural selection process maps cleanly onto RL in the limit.

The URL is broken (points to edit page)

19moHopefully that's fixed! I wrote this as quickly as possible so there may be many
tiny errors. Apologies. Let me know if anything else is wrong.

High Stock Prices Make Sense Right Now

Regarding safer assets, when you put your money into a savings account (loan it to the bank), what is the bank to do with it? Presumably it has promised you interest. Or if you buy treasuries - someone must have sold them to you - what do they do now with all the cash? Just because you personally didn't put your money into stocks does't mean nobody else downstream from you did.

And because most securities aren't up for sale at any given time, a small fraction of market participants can have outsized effects on prices. Consider oil back in Apr... (read more)

Covid-19: My Current Model

Here's a (high) schools data point... https://twitter.com/EricTopol/status/1266976828549238785?s=19

Covid-19: My Current Model

Regarding HCQ, the recent large-N studies were observational and looks like patients there were given HCQ late and if they were relatively sicker. Using it early on could still work (but now there won't be an RCT for that thanks to numerous delendae).

Regarding schools, did the countries that reopened those already fare particularly worse?

I don't have that info regarding schools but also no one is systematically collecting data on anything and everything is confounded, including by control systems.

On HCQ, as I noted in the other comment on it, I'm mostly predicting/observing that the scientific community has decided it's going to reject HCQ, preventing it from becoming a consensus treatment. This is partly for 'good' reasons, partly for not-so-good reasons that have nothing to do with science, partly because they no longer know how or are not allowed to study thing... (read more)

What are the best tools for recording predictions?

A while back TinyCast seemed pretty friendly: https://tinycast.cultivateforecasts.com/questions/new

OpenAI announces GPT-3

*steer*

badum-tsss

Iceland's COVID-19 random sampling results: C19 similar to Influenza

That's terrible news! It means that on top of the meager coronavirus there's another unidentified disease overcrowding the hospitals, causing respirator shortages all over the world, and threatening to kill millions of people!

-51y

Virus As A Power Optimisation Process: The Problem Of Next Wave

> The idea of “flattening the curve” is the worst, as it assumes a large number of infections AND a large number of virus generation AND high selective pressure

Flattening _per se_ doesn't affect the evolution of the virus much. It doesn't evolve on a time grid, but rather on an event grid where an event is spreading from a person to another. As long as it spreads the same number of times it will have the same number of opportunities to evolve.

Let My People Stay Home

"Overreacting to underestimates" - great way of putting it!

Frivolous speculation about the long-term effects of coronavirus

Fewer waiting lines?

Is it worthwhile to save the cord blood and tissue?

Congratulations!

If you're trying to be homo economicus and maximize your expected utility, probably it's not worth it. But if you're not, you can still do it! We did (blood and tissue).

- How valuable are the stem cells right now
- not very valuable
- and how valuable are they expected to be in the future?
- very valuable but that's been the expectation for a long time and yet here we are
- How hard is it to get stem cells for yourself / your child right now vs in the future?
- anything you harvest later on will have had more cellular divisions in its hi

[AN #77]: Double descent: a unification of statistical theory and modern ML practice

I don't see how it would explain double descent on training time. This would imply that gradient descent on neural nets first has to memorize noise in one particular way, and then further training "fixes" the weights to memorize noise in a different way that generalizes better

For example, the (random, meaningless) weights used to memorize noise can get spread across more degrees of freedom, so that on the test their sum will be closer to 0.

21yThat does not intuitively make sense to me. I'd need to see an example or more
fleshed out argument to be convinced.
(Also, it sounds like an argument for model-wise double descent, but not
epoch-wise double descent.)

Has Moore's Law actually slowed down?

The 5nm in "5nm scale" no longer means "things are literally 5nm in size". Rather, it's become a fancy way of saying something like "200x the linear transistor density of an old 1-micron scale chip". The gates are still larger than 5nm, it's just that things are now getting put on their side to make more room ( https://en.wikipedia.org/wiki/FinFET ). Some chip measures sure are slowing down, but Moore's law (referring to the number of transistors per chip and nothing else) still isn't one of them despite claims of impending doom due to "quantum effects" originally dating back to (IIRC) the eighties.

I'm looking for alternative funding strategies for cryonics.

I know some people who (at least used to) maintain a group pool of cash to fund the preservation of whoever died first (at which point the pool would need to be refilled). So if you're unlucky first to die out of people, you only pay of the full price, and if you're lucky (last to die) you eventually pay about times the price, but at least you get more time to earn the money. Not sure how it was all structured legally. Of course if you're really pressed for time it may be hard to convince other people for such an arrangement.

Fu... (read more)

There aren't that "many" other companies. Talk to KrioRus, I know they explored setting up a cryonics facility in Switzerland at some point.

Swarm AI (tool)

I'm pretty sure (epistemic status: Good Judgment Project Superforecaster) the "AI" in the name is pure buzz and the underlying aggregation algorithm is something very simple. If you want to set up some quick group predictions for free, there's https://tinycast.cultivatelabs.com/ which has a transparent and battle-tested aggregation mechanism (LMSR prediction markets) and doesn't use catchy buzzwords to market itself. For other styles of aggregation there's "the original" Good Judgment Inc, a spinoff from GJP which ac... (read more)

Book Trilogy Review: Remembrance of Earth’s Past (The Three Body Problem)

The books are marketed as "hard" sci-fi but it seems all the "science" (at least in the first book, didn't read the others) is just mountains of mysticism constructed around statements that can sound "deep" on some superficial level but aren't at all mysterious, like "three-body systems interacting via central forces are generally unstable" or "you can encode some information into the quantum state of a particle" (yet of course they do contain nuance that's completely lost on the author, such... (read more)

-710mo

12yHuh. I don't think I ever heard someone call this series hard sci-fi where I
could hear them; the most common recommendation was related to its Chineseness,
which, as Zvi claims, definitely delivers.
And I'm not sure I'd take Niven as the archetype of truly hard sci-fi; have you
ever tried Egan? Diaspora says sensible things about philosophy of mind for
emulated, branching AIs with a plot arc where the power laws of a
5+1-dimensional universe become relevant, and Clockwork Rocket invents alternate
laws of special relativity incidentally to a story involving truly creative
alt-biology...

Beliefs at different timescales

(epistemic status: physicist, do simulations for a living)

Our long-term thermodynamic model Pn is less accurate than a simulation

I think it would be fair to say that the Boltzmann distribution and your instantiation of the system contain not more/less but _different kinds of_ information.

Your simulation (assume infinite precision for simplicity) is just one instantiation of a trajectory of your system. There's nothing stochastic about it, it's merely an internally-consistent static set of configurations, connected to each other by deterministic e... (read more)

23yThanks, I didn't know that about the partition function.
In the post I was thinking about a situation where we know the microstate to
some precision, so the simulation is accurate. I realize this isn't realistic.

The Second Law of Thermodynamics, and Engines of Cognition

(the paper: https://journals.aps.org/pr/abstract/10.1103/PhysRev.106.620)

The Second Law of Thermodynamics, and Engines of Cognition

There's nothing magical about reversing particle speeds. For entropy to decrease to the original value you would have to know and be able to change the speeds with perfect precision, which is of course meaningless in physics. If you get it even the tiniest bit off you might expect _some_ entropy decrease for a while but inevitably the system will go "off track" (in classical chaos the time it's going to take is only logarithmic in your precision) and onto a different increasing-entropy trajectory.

Jaynes' 1957 paper has a nice formal explanation of entropy vs. velocity reversal.

13y(the paper: https://journals.aps.org/pr/abstract/10.1103/PhysRev.106.620)

Safely and usefully spectating on AIs optimizing over toy worlds

design the AI in such a way that it can create agents, but only

This sort of argument would be much more valuable if accompanied by a specific recipe of how to do it, or at least a proof that one must exist. Why worry about AI designing agents, why not just "design it in such a way" that it's already Friendly!

23yI agree. I didn't mean to imply that I thought this step would be easy, and I
would also be interested in more concrete ways of doing it. It's possible that
creating a hereditarily restricted optimizer along the lines I was suggesting
could end up being approximately as difficult as creating an aligned
general-purpose optimizer, but I intuitively don't expect this to be the case.

Applying Bayes to an incompletely specified sample space

I agree, it did seem like one of the more-unfinished parts. Still, perhaps a better starting point than nothing at all?

Applying Bayes to an incompletely specified sample space

Check the chapter on the A_p distribution in Jaynes' book.

43yI've always thought that chapter was a weak point in the book. Jaynes doesn't
treat probabilities of probabilities in quite the right way (for one thing
they're really probabilities of frequencies). So take it with a grain of salt.

The Value of Those in Effective Altruism

Losing a typical EA ... decreasing ~1000 utilons to ~3.5, so a ~28500% reduction per person lost.

You seem to be exaggerating a bit here: that's a 99.65% reduction. Hope it's the only inaccuracy in your estimates!

15yAs the comment below indicates, I think we don't disagree on the math, it's the
semantics issue. When I talk about reduction per person lost, I compare the
utilons from a typical person to the typical EA., which is a 996.5 utilon
difference. So comparing that is 996.5/3.5 * 100% = 28471%.

25ytomayto tomahto

Behavior: The Control of Perception

Here's another excellent book roughly from the same time: "The Phenomenon of Science" by Valentin F. Turchin (http://pespmc1.vub.ac.be/posbook.html). It starts from largely similar concepts and proceeds through the evolution of the nervous system to language to math to science. I suspect it may be even more AI-relevant than Powers.

06yThanks for the link (which has the free pdf, for anyone else interested)! After
a few months at being only at a book or two, my reading queue is up towards a
dozen again, so I'm not sure when I'll get to reading it.

Bayesianism for humans: "probable enough"

Hi shminux. Sorry, just saw your comment. We don't seem to have a date set for November yet, but let me check with the others. Typically we meet on Saturdays, are you still around on the 22nd? Or we could try Sunday the 16th. Let me know.

07yI'm leaving on Thu very early, so Sunday is better. However, I might be occupied
with some family stuff instead, so please do not change your plans because of
me. I'll check the Google group messages and contact you if I can make it.
Thanks!

Bayesianism for humans: "probable enough"

The Planning Fallacy explanation makes a lot of sense.

17yOff-topic: you seem to be one of the organizers of the Houston meetup. I'll be
in town the week of Nov 16, feel free to let me know if there is anything
scheduled.

Meetup : Houston, TX

I hope it's not *really* at 2AM.

Too good to be true

While the situation admittedly is oversimplified, it does seem to have the advantage that anyone can replicate it exactly at a very moderate expense (a two-headed coin will also do, with a minimum amount of caution). In that respect it may actually be more relevant to real world than any vaccine/autism study.

Indeed, every experiment should get a pretty strong p-value (though never exactly 1), but what gets reported is not the actual p but whether it is above .95 (which is an arbitrary threshold proposed once by Fisher who never intended it to play the role... (read more)

Too good to be true

(1) is obvious, of course--in hindsight. However changing your confidence level after the observation is generally advised against. But (2) seems to be confusing Type I and Type II error rates.

On another level, I suppose it can be said that *of course* they are all biased! But, by the actual two-tailed coin rather than researchers' prejudice against normal coins.

Too good to be true

Treating ">= 95%" as "= 95%" is a reasoning error

Hence my question in another thread: Was that "exactly 95% confidence" or "at least 95% confidence"? However when researchers say "at a 95% confidence level" they typically mean "*p* < 0.05", and reporting the actual *p*-values is often even explicitly discouraged (let's not digress into whether it is justified).

Yet *the* mistake I had in mind (as opposed to other, less relevant, merely "*a*" mistakes) involves Type I and Type II error ra... (read more)

Too good to be true

Well, perhaps a bit too simple. Consider this. You set your confidence level at 95% and start throwing a coin. You observe 100 tails out of 100. You publish a report saying "the coin has tails on both sides at a 95% confidence level" because that's what you chose during design. Then 99 other researchers repeat your experiment with the same coin, arriving at the same 95%-confidence conclusion. But you would expect to see about 5 reports claiming otherwise! The paradox is resolved when somebody comes up with a trick using a mirror to observe both sides of the coin at once, finally concluding that the coin *is* two-tailed with a 100% confidence.

What was the mistake?

17yThe actual situation is described this way:
I have a coin which I claim is fair: that is, there is equal chance that it
lands on heads and tails, and each flip is independent of every other flip.
But when we look at 60 trials of the coin flipped 5 times (that is, 300 total
flips), we see that there are no trials in which either 0 heads were flipped or
5 heads were flipped. Every time, it's 1 to 4 heads.
This is odd- for a fair coin, there's a 6.25% chance that we would see 5 tails
in a row or 5 heads in a row in a set of 5 flips. To not see that 60 times in a
row has a probability of only 2.1%, which is rather unlikely! We can state with
some confidence that this coin does not look fair; there is some structure to it
that suggests the flips are not independent of each other.

17yOne mistake is treating 95% as the chance of the study indicating two-tailed
coins, given that they were two-tailed coins. More likely it was meant as the
chance of the study not indicating two-tailed coins, given that they were not
two-tailed coins.
Try this:
You want to test if a coin is biased towards heads. You flip it 5 times, and
consider 5 heads as a positive result, 4 heads or fewer as negative. You're
aiming for 95% confidence but have to get 31/32 = 96.875%. Treating 4 heads as a
possible result wouldn't work either, as that would get you less than 95%
confidence.

57yI don't know if the original post was changed, but it explicitly addresses this
point:

17yThis doesn't seem like a good analogy to any real-world situation. The null
hypothesis ("the coin really has two tails") predicts the exact same outcome
every time, so every experiment should get a p-value of 1, unless the
null-hypothesis is false, in which case someone will eventually get a p-value of
0. This is a bit of a pathological case which bears little resemblance to real
statistical studies.

07yI don't see a paradox. After 100 experiments one can conclude that either the
confidence level was set too low, or the papers are all biased toward two-tailed
coins. But which is it?

-17yNeglecting all of the hypotheses which would result in the mirrored observation
which do not involve the coin being two tailed. The mistake in your question is
the "the". The final overconfidence is the least of the mistakes in the story.
Mistakes more relevant to practical empiricism: Treating ">= 95%" as "= 95%" is
a reasoning error, resulting in overtly wrong beliefs. Choosing to abandon all
information apart from the single boolean is a (less serious) efficiency error.
Listeners can still be subjectively-objectively 'correct', but they will be less
informed.

Too good to be true

How does your choice of threshold (made beforehand) affect your actual data and the information about the actual phenomenon contained therein?

Meetup : Houston, TX

suggestion posted to the Google Group:

Another idea might be to decide ahead of each meetup on a few topics for discussion to allow some time to prepare, research and think about things for some time before discussing with each other.

Too good to be true

Also, different studies have different statistical power, so it may not be OK to simply add up their evidence with equal weights.

8[anonymous]7yp-values are supposed to be distributed uniformly from 0 to 1 conditional on the
null hypothesis being true.

17yNo; it's standard to set the threshold for your statistical test for 95%
confidence. Studies with larger samples can detect smaller differences between
groups with that same statistical power.

Too good to be true

Was that "exactly 95% confidence" or "at least 95% confidence"?

57yAlso, different studies have different statistical power, so it may not be OK to
simply add up their evidence with equal weights.

Meetup : Houston, TX

*(I highly recommend that everyone join the Google Group so that we can all communicate in a single place by email)*

Does anyone else feel like trying to get this meeting a little bit more structured?

For example, something as simple as brief but *prepared* self-introductions covering your interests (related or unrelated to LW) and anything else about yourself that you might consider worth a mention. We partially covered it last time but it was pretty chaotic.

Or maybe someone even wants to give a brief talk about something they find exciting. Back in the day Jon... (read more)

0[anonymous]7yI was thinking about the introductions too; I think this time we probably can do
something a bit more structured.

Meetup : Houston, TX

Oh yes, and last time somebody discovered that there's free parking on Main St across from campus (the stretch between Med Center and Hotel ZaZa).

Meetup : Houston, TX

Hopefully, this time Valhalla should be open for, um, follow-up discussions. http://valhalla.rice.edu/

17yOh yes, and last time somebody discovered that there's free parking on Main St
across from campus (the stretch between Med Center and Hotel ZaZa).

The Power of Noise

It seems that in the rock-scissors-paper example the opponent is quite literally an adversarial superintelligence. They are more intelligent than you (at this game), and since they are playing against you, they are adversarial. The RCT example also has a lot of actors with different conflicts of interests, especially money- and career-wise, and some can come pretty close to adversarial.

6[anonymous]7y"adversarial superintelligence" sounds like something you don't have to worry
about facing pre-singularity. "someone who's better than you at
rock-paper-scissors" sounds rather more mundane. Using the former term makes the
situation look irrelevant by sneaking in connotations.

Meetup : Houston, TX

Free parking is available in the small streets across Rice Boulevard from the campus (north of it). This is also closer.

Common sense quantum mechanics

Here are some nice arguments about different what-if/why-not scenarios, not fully rigorous but sometimes quite persuasive: http://www.scottaaronson.com/democritus/lec9.html

Common sense quantum mechanics

I'm not sure if we can say much about a classical universe "in practice" because in practice we do not live in a classical universe. I imagine you could have perfect information if you looked at some simple classical universe from the outside.

For classical universes with complete information you have Newtonian dynamics. For classical universes with incomplete information about the state you can still use Newtonian dynamics but represent the state of the system with a probability distribution. This ultimately leads to (classical) statistical mecha... (read more)

Common sense quantum mechanics

Thanks! The list of assumptions seems longer than in the De Raedt *et al.* paper and you need to first postulate branching and unitarity (let's set aside how reasonable/justified this postulate is) in addition to rational reasoning. But it looks like you can get there eventually.

Common sense quantum mechanics

Luke, please correct me if I'm misunderstanding something.

The rule follows directly if you require that the wavefunction behaves like a "vector probability". Then you look for a measure that behaves like probability should (basically, nonnegative and adding up to 1). And you find that for this the wavefunction should be complex-valued and the probability should be its squared amplitude. You can also show that anything "larger" than complex numbers (e.g. quaternions) will not work.

But, as you said, the question is not how to derive the B... (read more)

07yThe two requirements are that it be on the domain of probabilities (reals on
0-1), and that they nest properly.
Quaternions would be OK as far as the Born rule is concerned - why not? They
have a magnitude too. If we run into trouble with them, it's with some other
part of QM, not the Born rule (and I'm not entirely confident that we do - I
have hazy recollection of a formulation of the Dirac equation using quaternions
instead of complex numbers).

Common sense quantum mechanics

I certainly would not rule out number 5 ;) As for 3, the arguments seem to apply to any universe in which you can carry out a reproducible experiment. However, in a "classical universe" everything is, in principle, exactly knowable, and so you just don't *need* a probabilistic description.

Unless there is limited information, in which case you use statistical mechanics. With perfect information you know which microstate the system is in, the evolution is deterministic, there is no entropy (macrostate concept), hence no second law, etc. Only when you... (read more)

17yIn case it's less than perfectly clear, I am very much not ruling out number 5;
that's why it's there. But for obvious reasons there's not much I can say about
how it might be true and what the consequences would be.
Even in a classical universe your knowledge is always going to be incomplete in
practice. (Perfectly precise measurement is not in general possible. Your brain
has fewer possible states than the whole universe. Etc.) So probabilistic
reasoning, or something very like it, is inescapable even classically.
Regardless, though, it would be pretty surprising to me if mere
"underconfidence" (supposing it to be so) required a quantum [EDITED TO ADD:
model of the] universe.

Meetup : Houston, TX

Here is the Houston LW Google Group: https://groups.google.com/forum/?forum/houston-lesswrong#!forum/houston-lesswrong

Common sense quantum mechanics

Can this argument be summarized in some condensed form? The paper is long.

47yI'm not sure that the proof can be summarised in a comment, but the theorem can:
Suppose you are an agent that knows that you are living in an Everettian
universe. You have a choice between unitary transformations (the only type of
evolution that the world is allowed to undergo in MWI), that will in general
cause your 'world' to split and give you various rewards or punishments in the
various resulting branches. Your preferences between unitary transformations
satisfy a few constraints:
* Some technical ones about which unitary transformations are available.
* Your preferences should be a total ordering on the set of the available
unitary transformations.
* If you currently have unitary transformation U available, and after
performing U you will have unitary transformations V and V' available, and
you know that you will later prefer V to V', then you should currently prefer
(U and then V) to (U and then V').
* If there are two microstates that give rise to the same macrostate, you don't
care about which one you end up in.
* You don't care about branching in and of itself: if I offer to flip a quantum
coin and give you reward R whether it lands heads or tails, you should be
indifferent between me doing that and just giving you reward R.
* You only care about which state the universe ends up in.
* If you prefer U to V, then changing U and V by some sufficiently small amount
does not change this preference.
Then, you act exactly as if you have a utility function on the set of rewards,
and you are evaluating each unitary transformation based on the weighted sum of
the utility of the reward you get in each resulting branch, where you weight by
the Born 'probability' of each branch.

This is the correct answer to the question. Bell and CHSH and all are remarkable but more complicated setups. This - entanglement no matter which basis you'll end up measuring your particle in, not known at the time of state preparation, - is what's salient about the simple 2-particle setup.