gathaung - LessWrong

The robust beauty of improper linear models

Nice. To make your proposed explanation more precise:

Take a random vector on the n-dim unit sphere. Project to the nearest (+1,-1)/sqrt(n) vector; what is the expected l2-distance / angle? How does it scale with n?

If this value decreases in n, then your explanation is essentially correct, or did you want to propose something else?

Start by taking a random vector x where each coordinate is unit gaussian (normalize later). The projection px just splits into positive coordinates and negative coordinates.

We are interested in E[ / |x| sqrt(n)].

If the dimension is large enough, then we wont really need to normalize; it is enough to start with 1/sqrt(n) gaussians, as we will almost almost surely get almost unit length. Then all components are independent.

For the angle, we then (approximately) need to compute E(sum_i |x_i| / n), where each x_i is unit Gaussian. This is asymptotically independent of n; so it appears like this explanation of improper linear models fails.

Darn, after reading your comment I mistakenly believed that this would be yet another case of "obvious from high-dimensional geometry" / random projection.

PS. In what sense are improper linear models working? l_1, l2, l\infty sense?

Edit: I was being stupid, leaving the above for future ridicule. We want E(sum_i |x_i| / n)=1, not E(sum_i |x_i|/n)=0.

Folded Gaussian tells us that E[ sum_i |x_i|/n]= sqrt(2/pi), for large n. The explanation still does not work, since 2/pi <1, and this gives us the expected error margin of improper high-dimensional models.

@Stuart: What are the typical empirical errors? Do they happen to be near sqrt(2/pi), which is close enough to 1 to be summarized as "kinda works"?

Hidden universal expansion: stopping runaways

gathaung7y00

Second question:

Do you have a nice reference (speculative feasibility study) for non-rigid coil-guns for acceleration?

Obvious idea would be to have a swarm of satellites with a coil, spread out over the solar system. Outgoing probe would pass through a series of such coils, each adding some impulse to the probe (and doing minor course corrections). Obviously needs very finely tuned trajectory.

Advantage over rigid coil-gun: acceleration spread out (unevenly) over longer length (almost entire solar system). This is good for heat dissipation (no coupling is perfect), and maintaining mega-scale rigid objects appears difficult. Satellites can take their time to regain position (solar sail / solar powered ion thruster / gravity assist). Does not help with g-forces.

Disadvantage: Need a large number of satellites in order to get enough launch windows. But if we are talking dyson swarm anyway, this does not matter.

How much do we gain compared to laser acceleration? Main question is probably: How does the required amount of heat dissipation compare?

Hidden universal expansion: stopping runaways

gathaung7y00

Do you have a non-paywalled link, for posterity? I use sci-hub, but paywalls are a disgrace to science.

Also, do you have a nice reference for the bussard ramjet/ramscoop deceleration?

Obvious advantage: A priori you don't need nuclear fusion at all. You use a big em-field for cross-section and use, ultimately, drag against the interstellar medium for both deceleration and energy generation. No deceleration needed in (thinner) intergalactic medium. Entropy gain should be large enough to run mighty heat-pumps (for maintaining high field superconductors and radiating excess heat). No need to carry fuel or manage fusion; your kinetic energy at relativistic speeds has almost as much energy as antimatter. Antimatter sucks because production, containment, and difficulty of not frying yourself with the resulting radiation (light probe cannot shield against gamma), and probably a couple more reasons.

Disadvantage: not obvious whether this works. I would appreciate an actual engineer doing the computation. (I am just a mathematician, and have not seen a study of this deceleration design because I suck at searching the literature)

Probably at least three problems:

(1) How much impulse at what speeds? Determined by cross-section of collecting EM-field over required mass of collector.

(2) Might be good for decelerating from 0.9c to 0.05c over maybe 10k years (pulling numbers out of my ass). Would still need secondary system for the remaining deceleration, until slow enough for gravity assists. Could collect propellant over the long deceleration, but then would need to dissipate a shitload of heat; unclear whether net gain.

(3) Heat dissipation.

I agree that deceleration is the thing to care about; beat the rocket equation on deceleration by clever designs using the interstellar medium, and on acceleration by big machines.

Hidden universal expansion: stopping runaways

gathaung7y00

What are your scenarios for interstellar warfare? The question obviously depends on whatever turns out to be the technically mature way of violent conflict resolution.

Let me propose a naive default guess:

Small technically mature von-neumann probe meets primitive civilization or unsettled system: probe wins.

Small technically mature von-neumann probe meets system with technically almost-mature inhabitants: probe cannot even make problems.

System with dyson swarm + AI: Unassailable on short timescales. Impossible to profitably invade. Maybe sling another star at it if you control the stellar neighbourhood.

In this scenario, interstellar warfare is a matter of land-grabbing: Spam the entire sky with probes, moving as fast as possible, dyson a fraction of stars to keep up the expansion front, and fortify all other systems. "Fortify" might just mean "build and maintain a couple thousand tons of observatories & industrial base", i.e. almost nothing: One just needs enough headstart to win any inner-system race against later von-neumann probes. This is relevant if the colonizer has reasons to keep most systems mostly virgin, and is compatible with the silent sky.

In this scenario, if we saw an expansion front, we would rush to move from category (1) to (2); this is slightly bad for the big colonizer.

What does "flee" mean in this context? It would mean rushing to grab a bigger slice of the pie. Do I understand you correctly there?

On the other hand, the game-theory appears to suggest that colonization speed dominates stealth, all the time: The only reaction move we have is to do what we should do anyway, if we care about colonizing the universe (if we don't care then we don't need to react at all, just keep our system).

So, in summary, I do not understand how the red civilization intends to influence our decision processes by staying stealthy.

Open thread, May 15 - May 21, 2017

gathaung7y00

You should strive to maximize utility of your pattern, averaged over both subjective probability (uncertainty) and squared amplitude of wave-function.

If you include the latter, then it all adds up to normalcy.

If you select a state of the MWI-world according to born rule (i.e. using squared amplitude of the wave-function), then this world-state will, with overwhelming probability, be compatible with causality, entropy increase over time, and a mostly classic history, involving natural selection yielding patterns that are good at maximizing their squared-amplitude-weighted spread, i.e. DNA and brains that care about squared-amplitude (even if they don't know it).

Of course this is a non-answer to your question. Also, we have not yet finished the necessary math to prove that this non-answer is internally consistent (we=mankind), but I think this is (a) plausible, (b) the gist of what EY wrote on the topic, and (c) definitely not an original insight by EY / the sequences.

Open thread, Mar. 20 - Mar. 26, 2017

gathaung7y00

It was not my intention to make fun of Viliam; I apologize if my comment gave this impression.

I did want to make fun of the institution of Mensa, and stand by them deserving some good-natured ridicule.

I agree with your charitable interpretation about what an IQ of 176 might actually mean; thanks for stating this in such a clear form.

That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi's paradox

gathaung7y00

In Section 3, you write:

State value models require resources to produce high-value states. If happiness is the goal, using the resources to produce the maximum number of maximally happy minds (with a tradeoff between number and state depending on how utilities aggregate) would maximize value. If the goal is knowledge, the resources would be spent on processing generating knowledge and storage, and so on. For these cases the total amount of produced value increases monotonically with the amount of resources, possibly superlinearly.

I would think that superlinear scaling of utility with resources is incompatible with the proposed resolution of the Fermi paradox. Why?

Superlinear scaling of utility means (ignoring detailed numbers) that e.g. a distribution of 1% chance of 1e63 bit-erasures + 99% of fast extinction is preferable to almost certain 1e60 bit erasures. This seems (1) dubious from an, admittedly human-centric, common sense perspective, and more rigorously (2) is incompatible with the observation that possibilities for immediate resource extraction which don't affect later computations are not realized. In other words: You do not propose a mechanism how a dyson-swarm to collect current energy/entropy emitted by stars would decrease the total amount of computation to be done over the life-time of the universe. Especially the energy/negative entropy contained in unused emissions of current stars appears to dissipate into un-useful background glow.

I would view the following, mostly (completely?) contained in your paper as a much more coherent proposed explanation:

(1) Sending self-replicating probes to most stars in the visible universe appears to be relatively cheap [your earlier paywalled paper]

(2) This gives rise to a much stronger winner-takes-all dynamics than just colonization of a single galaxy

(3) Most pay-off, in terms of computation, is in the far future after cooling

(4) A stongly sublinear utility of computation makes a lot of sense. I would think more in direction of poly-log, in the relevant asymptotics, than linear.

(5) This implies a focus on certainty of survival

(6) This implies a lot of possible gain from (possibly a-causal) value-trade / coexistence.

(7) After certainty of survival, this implies diversification of value. If, for example, the welfare and possible existence of alien civilizations is valued at all, then the small marginal returns on extra computations on the main goals lead to gifting them a sizable chunk of cosmic real estate (sizable in absolute, not relative terms: A billion star systems for a billion years are peanuts compared to the size of the cosmic endowment in the cold far future)

This boils down to an aestivating-zoo scenario: Someone with strongly sublinear utility function and slow discounting was first to colonize the universe, and decided to be merciful to late-comers; either for acausal trade reasons, or for terminal values. Your calculations boil down showing the way towards a lower-bound on the amount of necessary mercy for late-comers: For example, if the first mover decided to sacrifice 1e-12 of its cosmic endowment to charity, this might be enough to explain the current silence (?).

The first-mover would send probes to virtually all star systems, which run nice semi-stealthy observatories, e.g. on an energy budget of a couple giga-watt of solar panels on asteroids. If a local civilization emerges, it could go "undercover". It appears unlikely that a locally emergent superintelligence could threaten the first colonizer: The upstart might be able to take its own home system, but invading a system that has already a couple thousand tons of technologically mature equipment appears physically infeasible, even for technically mature invaders. If the late-comer starts to colonize too many systems... well, stop their 30g probes once they arrive, containment done. If the late-comer starts to talk too loud on radio... well, ask them to stop.

In this very optimistic world, we would be quite far from "x-risk by crossing the berserker-threshold": We would be given the time and space to autonomously decide what to do with the cosmos, and afterwards be told "sorry, too late, never was an option for you; wanna join the party? Most of it is ours, but you can have a peanut!"

Question: What are the lower bounds on the charity-fraction necessary to explain the current silence? This is a more numerical question, but quite important for this hypothesis.

Note that this does not require any coordination beyond the internal coordination of the first mover: All later civs are allowed to flourish in their alloted part of the universe; it is just their expansion that is contained. This strongly reduces the effective amount of remaining filter to explain: We just need technological civilization to emerge rarely enough compared to the first-colonizer set upper expansion bound (instead of the size of the universe). For further reductions, the first-colonizer might set upper time-of-existence bounds, e.g. offer civilizations that hit their upper bound the following deal: "hey, guys, would you mind uploading and clearing your part of space for possible future civilizations? We will pay you in more computation in the far future than you have any way of accessing in other ways. Also, this would be good manners, since your predecessors' agreement to this arrangement is the reason for your existence".

PS, on (4) "strongly sublinear utility function". If high-risk high-payoff behaviour is possible at all, then we would expect the median universe to be taken by risk-adverse (sublinear utility scaling) civs, and would expect almost all risk-hungry (superlinear utility scaling) civs to self-destruct. Note that this is rational behaviour of the risk-hungry civs, and I am not criticizing them for it. However, I view this as a quite weak argument, since the only plausible risk/reward trade-off on a cosmic scale appears to be in uncertainty about terminal values (and time discounting). Or do you see plausible risk/reward trade-offs?

Also, the entire edifice collapses if the first colonizer is a negative utilitarian.

Open thread, Mar. 20 - Mar. 26, 2017

gathaung7y00

Congrats! This means that you are a Mensa-certified very one-in-a-thousand-billion-special snowflake! If you believe in the doomsday argument then this ensures either the continued survival of bio-humans for another thousand years or widespread colonization of the solar system!

On the other hand, this puts quite the upper limit on the (institutional) numeracy of Mensa... wide guessing suggests that at least one in 10^3 people have sufficient numeracy to be incapable of testifying an IQ of 176 with a straight face, which would give us an upper bound on the NQ (numeracy quotient) of Mensa at 135.

(sorry for the snark; it is not directed at you but at the clowns at Mensa, and I am not judging anyone for having taken these guys seriously at a younger age)

Regarding your serious points: Obviously you are right, and equally obviously luck (living at the right time and encountering the right problem that you can solve) also plays a pretty important role. It is just that we do not have sensible definitions for "intelligence".

IQ is by design incapable of describing outliers, and IMHO mostly nonsense even in the bulk of the distribution (but reasonable people may disagree here). Also, even if you somehow construct a meaningful linear scale for "intelligence", then I very strongly suppose that the distribution will be very far from Gaussian at the tails (trivially so at the lower end, nontrivially so at the upper end). Also, applying the inverse error-function to ordinal scales... why?

What Value Epicycles?

gathaung7y30

I think a nicer analogy are spectral gaps. Obviously, no reasonable finite model will be both correct and useful, outside of maybe particle physics; so you need to choose some cut-off of you model's complexity. The cheapest analogy is when you try to learn a linear model, e.g. PCA/SVD/LSA (all the same).

A good model is one that hits a nice spectral gap: Adding a couple of extra epicycles gives only a very moderate extra accuracy. If there are multiple nice spectral gaps, then you should keep in mind a hierarchy of successively more complex and accurate models. If there are no good spectral gaps, then there is no real preferred model (of course model accuracy is only partially ordered in real life). When someone proposes a specific model, you need to ask both "why not simpler? How much power does the model lose by simplification?", as well as "Why not more complex? Why is any enhancement of the model necessarily very complex?".

However, what constitutes a good spectral gap is mostly a matter of taste.

Open thread, Mar. 20 - Mar. 26, 2017

gathaung7y10

AFAIK (and wikipedia tells), this is not how IQ works. For measuring intelligence, we get an "ordinal scale", i.e. a ranking between test-subjects. An honest reporting would be "you are in the top such-and-so percent". For example, testing someone as "one-in-a-billion performant" is not even wrong; it is meaningless, since we have not administered one billion IQ tests over the course of human history, and have no idea what one-in-a-billion performance on an IQ test would look like.

Because the IQ is designed by people who would try to parse HTML by regex (I cannot think of a worse insult here), it is normalized to a normal distribution. This means that one applies the inverse error-function with SD of 15 points to the percentile data. Hence, IQ is Gaussian-by-definition. In order to compare, use e.g. python as a handy pocket calculator:

from math import *

iqtopercentile = lambda x: erfc((x-100)/15)/2

iqtopercentile(165)

4.442300208692339e-10

So we see that claims of any human being having an IQ of 165+ is statistically meaningless. If you extrapolated to all of human history, an IQ of 180+ is meaningless:

iqtopercentile(180)

2.3057198811629745e-14

Yep, by current definition you would need to test 10^14 humans to get one that manages an IQ of 180. If you test 10^12 humans and one god-like super-intelligence, then the super-intelligence gets an IQ of maybe 175 -- because you should not apply the inverse error-function to an ordinal scale, because ordinal scales cannot capture bimodals. Trying to do so invites eldritch horrors on our plane who will parse HTML with a regex.

LESSWRONG
LW

Posts

Wiki Contributions

Comments