Worse Than Random

But it is an inherently odd proposition that you can get a better picture of the environment by adding noise to your sensory information - by deliberately throwing away your sensory acuity. This can only degrade the mutual information between yourself and the environment. It can only diminish what in principle can be extracted from the data.

It is certainly counterintuitive to think that, by adding noise, you can get more out of data. But it is nevertheless true.

Every detection system has a perceptual threshold, a level of stimulation needed for it to register a signal. If the system is mostly noise-free, this threshold is a ’sharp’ transition. If the system has a lot of noise, the theshold is ‘fuzzy’. The noise present at one moment might destructively interact with the signal, reducing its strength, or constructively interact, making it stronger. The result is that the threshold becomes an average; it is no longer possible to know whether the system will respond merely by considering the strength of the signal.

When dealing with a signal that is just below the threshold, a noiseless system won’t be able to perceive it at all. But a noisy system will pick out some of it - ... (read more)

3Houshalter12y

The pattern painted onto white paper can't be seen because the image is also white. If the white image is printed onto paper that has parts of it that aren't white of course it's going to be more visible. Adding noise would be the equivalent of taking the image already printed onto white paper, and just adding random static on top of it. It would be even harder to see still. What you're saying just makes no sense to me. Adding noise is just as likely to increase the existing signal as it is to decrease it. Or to make a signal appear that isn't there at all. I can't see how it's doing anything to help detect the signal.

6christopherj12y

What you're missing is that, if the signal is below the detection threshold, there is no loss if the noise pushes it farther below the detection threshold, whereas there is a gain when the noise pushes the signal above the detection threshold. Thus the noise increases sensitivity, at the cost of accuracy. (And since a lot of sensory information is redundant, the loss of accuracy is easy to work around.)

5xg1511y

In which case, you could view the image even better if you just changed the whole backdrop to gray, instead of just random parts of it. This would correspond to the "using the same knowledge to produce a superior algorithm" part of the article. As I understood it, the article specifically did not state that you can't ever improve a deterministic algorithm by adding randomness - only that this is a sign that you algorithm is crap, not that the problem fundamentally requires randomness. There should always exist a different deterministic algorithm which is more accurate than your random algorithm (at least in theory - in practice, that algorithm might have an unacceptable runtime or it would require even more knowledge than you have)

1wafflepudding10y

This post is my first experience learning about noise in algorithms, so forgive me if I seem underinformed. Two points occurred to me while reading this comment, some clarification would be great: First, while it was intriguing to read that input just below the perceptual threshold would half the time be perceived by bumping it above the threshold, it seems to me that input just above the threshold would half the time be knocked below it. So wouldn't noise lead to no gain? Just a loss in acuity? Second, I'm confused how input below the perceptual threshold is actually input. If a chair moves in front of a camera so slightly that the camera doesn't register a change in position, the input seems to me like zero, and noise loud enough to move zero past the perceptual threshold would not distinguish between movement and stillness, but go off half the time and half the time be silent. If that doesn't make sense, assume that the threshold is .1 meters, and the camera doesn't notice any movement less than that. Let's say your noise is a random number between .01 meters and -.01 meters. The chair moves .09 meters, and your noise lands on .01 meters. I wouldn't think that would cross the threshold, because the camera can't actually detect that .09 meters if it's threshold is .1. So, wouldn't the input just be 0 motion detected + .01 meters of noise = .01 meters of motion? Maybe I'm misunderstanding.

4gjm10y

Suppose you have a motion-detector that looks once per second and notices a change when the chair moves by 0.1m within a second and is completely blind to smaller changes. Then a chair moving at 0.09m/s won't trigger it at all. Now suppose you add noise of amplitude +-0.01m. Then in most seconds you still won't see anything, but sometimes (I think 1/8 of the time, if that noise is uniformly distributed) the apparent movement will be above the threshold. So now if you do some kind of aggregation of the detector output over time you'll be able to tell that the chair is moving. Yes, the cost of this is that above the threshold your performance is worse. You'll need to take averages or something of the kind to make up for it. (But: when a detector has a threshold, it usually doesn't give perfectly accurate measurements just above the threshold. You may find that even above the threshold you actually get more useful results in the presence of noise.) Another example. Suppose you are trying to detect oscillating signals (musical notes, radio waves, ...) via an analogue-to-digital converter. Let's say its resolution is 1 unit. Then a signal oscillating between -0.5 and +0.5 will not show up at all: every time you sample it you'll get zero. And any small change to the signal will make exactly no difference to the output. But if you add enough noise to that signal, it becomes detectable. You'll need to average your data (or do something broadly similar); you'll have some risk of false positives; but if you have enough data you can measure the signal pretty well even though it's well below the threshold of your ADC. [EDITED to add:] It may be worth observing that there's nothing super-special about adding random stuff for this purpose. E.g., suppose you're trying to measure some non-varying value using an analogue-to-digital converter, and the value you're trying to measure is smaller than the resolution in your ADC. You could (as discussed above) add noise and average. Bu

[-]Aaron317y60

A combination dial often has a tolerance of 2 in either direction. 20-45-35 will open a lock set to 22-33-44.

I certainly hope not! I think you intended 20-35-45 for the first or 22-44-33 for the second.

-2JasonHise11y

Cheap locks (like those used for middle school/high school lockers) have about as much variance as Eliezer claims. As horrifying as that may be.

2Good_Burning_Plastic11y

The point is that the two sentences quoted above contradict each other.

-1JasonHise11y

I disagree, a tolerance of 2 in either direction directly implies that starting from the solution you can round all the numbers up by two or round them down by two and the lock will still open. Edit: To follow up, the original claim in the article was: "20-45-35 will open a lock set to 22-44-33", or in other words, a combination off by "-2, +1, +2" will open the lock.

4Good_Burning_Plastic11y

Not as Aaron3 quoted it. (I guess EY has edited it since.)

1JasonHise11y

I somehow missed the 33-44 transpose in the quote. That would indeed be a wide variance.

[-]Patrick_(orthonormal)17y80

You might want to footnote, before anyone starts making noise about the ant example, that colony selection is not a case of group selection but a case of individual selection on the queen (and drones), since the rest of the ants don't reproduce.

[-]Joshua_Simmons17y30

Daniel C. Dennett argues that random data is useful in optimization as it's absolutely key to experimentation -- you must "wiggle" different variables, as he says, and observe changes. This is a foundational part of his argumentation in Darwin's Dangerous Idea and Freedom Evolves.

[-]Nick_Tarleton17y40

Caledonian: couldn't you always do better in such a case, in principle (ignoring resource limits), by increasing resolution?

[-]Tim_Tyler17y20

That is a fine observation from Caledonian. The noise is not being added to existing sense data, it's being added before the signal hits the receptors - but to stay in tune with the proposed scenario, we can easily imagine that the organism has already internalised the signals at that point.

Re: Daniel Dennet: that's not really right - random wiggling is better than no wiggling at all, but it is still an extremely bad way to generate variation.

[-]Caledonian217y00

Caledonian: couldn't you always do better in such a case, in principle (ignoring resource limits), by increasing resolution?

I double-checked the concept of 'optical resolution' on Wikipedia.Resolution is (roughly speaking) the ability to distinguish two dots that are close-together as different - the closer the dots can be and still distinguished, the higher the resolution, and the greater detail that can be perceived.

I think perhaps you mean 'sensitivity'. It's the ability to detect weak signals close to perceptual threshold that noise improves, not the detail.

[-]drone17y30

I'm surprised at the pushback on this rather simple straightforward point. In fact, it seems kind of beaten to death so I hope it has some cool unexpected consequence soon-to-be revealed.

Besides being easy, randomness has the amusing property of helping overcome bias.

In fact, since random search is unbiased search, it now seems that overcoming bias should be frowned on! Perhaps "Finding Effective Biases" is a better title for this blog.

[-]Nick_Tarleton17y40

Caledonian: Yes, I did. So: can't you always do better in principle by increasing sensitivity?

drone: In fact, since random search is unbiased search, it now seems that overcoming bias should be frowned on! Perhaps "Finding Effective Biases" is a better title for this blog.

"Inductive Bias"

[-]Joshua_Simmons17y00

Re: Tim Tyler

Granted -- it's certainly an incredibly inefficient way of generating variation, but it's infinitely better than no wiggling at all in that no wiggling would seem to deliver you unto the "stupid" algorithm which gets stuck at the local maxima.

In simulating biological evolution I might argue that random noise is a prerequisite to true congruence, but if the goal isn't accurate emulation and instead is efficient optimization, more engineered noise seems to be the clear means to a trained algorithm.

This leads me to believe that, althoug... (read more)

[-]Tom_McCabe217y20

"This may not sound like a profound insight, since it is true by definition. But consider - how many comic books talk about "mutation" as if it were a source of power? Mutation is random. It's the selection part, not the mutation part, that explains the trends of evolution."

I think this is a specific case of people treating optimization power as if it just drops out of the sky at random. This is certainly true for some individual humans (eg., winning the lottery), but as you point out, it can't be true for the system as a whole.

"... (read more)

[-]Ben_Jones17y00

So...noise can be useful in decision theory as long as you don't expect it to do any work. And the mistake gets easier to make the more complex your system. Sounds right enough to me.

[nitpick]

Your 'by definition' link needs a look, Eliezer.

Or imagine that the combination changes every second. In this case, 0-0-0-0, 0-0-0-0 is just as good as the randomized algorithm - no better and no worse.

If it changes every second, trying the same set of four over and over is marginally better than random.

If you've just entered 0-0-0-0 and got it wrong, then on the nex... (read more)

[-]Tim_Tyler17y00

Another perspective on the issue comes from considering lazer printer drivers - another system with thresholding. Adding noise produces better dithering than not doing so. It's not as good as Floyd-Steinberg dithering, of course - but it's often "cheaper".

[-]Caledonian217y40

Caledonian: Yes, I did. So: can't you always do better in principle by increasing sensitivity?

That's a little bit like saying that you could in principle go faster than light if you ignore relativistic effects, or that you could in principle produce a demonstration within a logical system that it is consistent if you ignore Godel's Fork.

There are lots of things we can do in principle if we ignore the fact that reality limits the principles that are valid.

As the saying goes: the difference between 'in principle' and 'in practice' is that in principle the... (read more)

[-]Jef_Allbright17y60

A few posters might want to read up on Stochastic Resonance, which was surprisingly surprising a few decades ago. I'm getting a similar impression now from recent research in the field of Compressive Sensing, which ostensibly violates the Nyquist sampling limit, highlighting the immaturity of the general understanding of information-theory.

In my opinion, there's nothing especially remarkable here other than the propensity to conflate the addition of noise to data, with the addition of "noise" (a stochastic element) to (search for) data.

This confusion appears to map very well onto the cybernetic distinction between intelligently knowing the answer and intelligently controlling for the answer.

[-]Silas17y50

@Joshua_Simmons: I got to thinking about that idea as I read today's post, but I think Eliezer_Yudkowsky answered it therein: Yes, it's important to expirment, but why must your selection of what to try out, be random? You should be able to do better by exploiting all of your knowledge about the structure of the space, so as to pick better ways to experiment. To the extent that your non-random choices of what to test do worse than random, it is because your understanding of the problem is so poor as to be worse than random.

(And of course, the only time w... (read more)

[-]Brian_Jaress217y40

How would you categorize the practice of randomly selecting the pivot element in a quicksort?

[-]Eliezer Yudkowsky17y130

Silas: @Caledonian: That's an interesting point. But are you sure the effect you describe (at science museums) isn't merely due to the brain now seeing a new color gradient in the image, rather than randomness as such? Don't you get the same effect from adding an orderly grid of dots? What about from aligning the dots along the lines of the image?

Yep. Adding a set of coherent modulations will do better than noise to improve your sensor, because you're guaranteed to get at least some modulations of a sufficiently high level, and you can subtract out the modulations afterward to arrive at a superior picture of the environment.

Remember, Eliezer_Yudkowsky's point was not that randomness can never be an improvement, but that it's always possible improve beyond what randomness would yield.

Lotta commenters seem to have entirely missed this.

0Thomas16y

How can you improve guessing which uranium atom will blow up next?

0[anonymous]16y

My best guess is a uniform distribution over all the atoms. No randomness involved. If you do select one atom at random to focus your guess on, you'll do worse than my maxentropy prior.

5[anonymous]16y

My best guess at which uranium atom will decay next is the uniform distribution over all the atoms. (Unless of course some of them are being bombarded or otherwise are asymmetric cases). If you focus your guess on a random one of the atoms, then you'll do worse (in terms of Bayesian log-score) than my deterministic choice of maxentropy prior.

[-]Unnamed16y110

Eliezer stated his point more precisely in the original post:

As a general principle, on any problem for which you know that a particular unrandomized algorithm is unusually stupid - so that a randomized algorithm seems wiser - you should be able to use the same knowledge to produce a superior derandomized algorithm.

I'd recommend engaging with that formulation of his point, rather than with Silas's summary (which is what you've quoted).

1NothingnessAbove6y

Give me a deterministic algorithm that performs worse than random on that problem and and I will show you how.

[-]GreedyAlgorithm17y80

Brian, the reason we do that is to avoid the quicksort algorithm being stupid and choosing the worst-case pivot every time. The naive deterministic choices of pivot (like "pick the first element") do poorly on many of the permutations of the input which are far more probable than 1/n! because of the types of inputs people give to sorting algorithm, namely, already or nearly-already sorted input. Picking the middle element does better because inputs sorted inside to outside are rarer, but they're still far more likely than 1/n! apiece. Picking a r... (read more)

[-]George_Weinberg217y10

Consider this scenario:

There are a large number of agents independently working on the same problem (for example, trying to find a string that hash-collides with some given string), but they cannot communicate in any way, they don't have any identification information about each other, they don't know how many other agents there are working on the problem (they aren't even sure there are any). It seems to me that each agent should decide at random where to start searching, not to fool each other but to avoid pointlessly duplicating each others' work.

Are you sure there is always something better than randomness?

6gwern16y

Here's a shot, assuming they couldn't ever coordinate even at the beginning. Each agent takes a hash function & runs it on some salient data they have access to. This could be their ID number (since they may not know the ID numbers of any of the other agents but may know their own), or maybe a memory they are certain they will have different from agents. If an agent is uncertain what hash function to use on what data, they can just pick the Schelling point, or pick randomly (which would still be better). We know there must be some difference between the agent and all the others, else in what sense are there multiple agents? The hash output is then used as the index into the array of all possible solutions. The agent works forward through the solutions and if they run out of solutions without finding a valid one, they go back to their index and work the other way. This is an improvement over each agent choosing a random index, because each agent will get a different hash on its unique identity. There may still be 'collisions' between agents, but a good hash will have fewer such collisions than each agent choosing randomly. And the subsequent search procedure guarantees that every possibility will eventually be visited by an agent. Simpler version: each agent takes their common PRNG and uses their identity as the seed, instead of, say, every agent picking '1' as the seed.

0Douglas_Knight16y

I don't know what you're suggesting, but it seems to me that you're just being specific about how to use the randomness, rather than using a different method. That's clear in your simpler version. But even in the first version, a hash pretty much is a PRNG, and you don't seem to use it in any additional way. If I understand correctly, you're saying the method with the hash is a better use of randomness than the method with the PRNG. That seems wrong to me. I think the expected number of collisions is equal, the only difference might be that the hash method has fewer random stages, thus more variance.

4gwern16y

Either 2 agents are identical and have identical environments, or they don't. * If they do, then it's hopeless: they must duplicate each other's work because if they didn't they would be different, and where could that difference come from if not the agent or the environment? N agents will perform N-1 useless searches of the array. The Fates will it. * If they don't, then the difference must be either in the agent or the environment or both 1. We might call any difference in the environment a random number generator. The agent might see a flickering light or be in a virtual world with little dust devils randomly racing around or just have access to a stream of bits - whatever. The RNG will give us a starting index. But RNGs can be non-identical but still 'collide' for the first number generated, leading to a wasted agent. (I think this is the birthday paradox: length! / length^n * (length - n)!) 2. Going back to the first point, either the random number generator is the same for 2 agents or it is different. If the RNG is the same, then the agents are right back in the identical-situation: using the RNG cannot help them act differently (choose a different index) from any of the other agents. So it is useless in this situation. The difference in the agent is useful, though. This difference could be childhood memories, or what-have-you. Let's just say it's expressed as a number m, which is less than the array length. This number will be unique among agents since otherwise the 2 agents sharing the number would be the same. We can then come up with a injective mapping from the agent's number to the indexes of the array, using a hash, for example, or perhaps just taking the number as a straight index. We get no collisions this way, while just using a random number permits collisions. Thus, going by m is better than going by the first random number. 3. But what if there are differences in both the agent & envi

0Douglas_Knight16y

If the only way you use the unique data is to feed it into a hash, you might as well be using a random number. If you get different results from the hash than from randomness, you've broken the hash.

2pengvado16y

That was my first impression too. But... isn't a hash considered to be cryptographically broken if you have a process that finds any collisions at all? Distinguishing based on the frequency of collisions (if that frequency is high enough to measure) is superfluous. edit: removed the rest of my argument, which was just equivalent to truncating hash values.

4Douglas_Knight16y

Yes, if you have few enough agents / work-flow that you're in P, then it is extremely unlikely that there will be absolute collisions, whether collisions of random numbers or hash values. But that chance is the same. You can break a hash through dumb luck! If you have lots of agents, then cryptographic security of the hash doesn't apply. But we're not talking about absolute collisions of hashes of id numbers. In talking about hashes, the assumption is that the space of id numbers and hash values are big and the space of problems we're working on is not larger. When we truncate the hash values to get work items, that's when we get collisions, at exactly the same rate as if it were random.

1gwern16y

The randomness is not as good. Even if the randomness is different from agent to agent, we still can get collisions. If we take the unique aspect of the agent, then by definition it isn't shared by other agents and so we can avoid any collision with other agents: A hash doesn't have to collide; it only has to have collisions (by the Pigeonhole Principle) if the end hash is 'smaller' than the input. If I'm using SHA512 on data that is always less than 512 bits, then I'll never get any collisions. (Let's assume SHA512 is a perfect hash; if this bothers you, replace 'SHA512' with '$FAVORITE_PERFECT_HASH'.) But using the hash isn't essential: we just need a mapping from agent to index. 'Hash' was the first term I thought of. (And the XOR trick lets us make use of a injective mapping regardless of whether it's the randomness or agent-related-data that is unique; we XOR them together and get something that is unique if either was.)

4pengvado16y

How do you know which aspect of the agent is unique, without prior communication? If it's merely that agents have so many degrees of freedom that there's a negligible probability that any two agents are identical in all aspects, then your hash output is smaller than its input. Also, you can't use the 2^512 figure for SHA-512 unless you actually want to split a 2^512 size array. If you only have, say, 20 choices to split, then 20 is the size that counts for collision frequency, no matter what hash algorithm you use. If hash-of-agent outputs are unique and your RNG is random, then the XOR is just random, not guaranteed-unique.

1gwern16y

If your array is size 20, say, then why not just take the first x bits of your identity (where 2^x=20)? (Why 'first', why not 'last'? This is another Schelling point, like choice of injective mapping.) This is a good point; I wasn't sure whether it was true when I was writing it, but since you've pointed it out, I'll assume it is. But this doesn't destroy my argument: you don't do any worse by adopting this more complex strategy. You still do just as well as a random pick. (Come to think of it: so what if you have to use the full bitstring specifying your uniqueness? You'll still do better on problems the same size as your full bitstring, and if your mapping is good, the collisions will be as 'random' as the RNGs and you won't do worse.)

[-]Mike_Plotz17y20

It's not always possible to improve beyond what randomness would yield. Consider, for example, the coin toss predicting game. Or the face-the-firing-squad game.

2Normal_Anomaly14y

In the coin toss predicting game, it is also impossible to do worse than random.

[-]Brian_Jaress217y00

GreedyAlgorithm, yes that's mostly why it's done. I'd add that it applies even when the source of the ordering is not a person. Measurement data can also follow the type of patterns you'd get by following a simple, fixed rule.

But I'd like to see it analyzed Eliezer's way.

How does the randomness tie in to acquired knowledge, and what is the superior non-random method making better use of that knowledge?

Using the median isn't it, because that generally takes longer to produce the same result.

[-]Tiiba317y00

"""What about from aligning the dots along the lines of the image?"""

Wouldn't you need to find them first?

[-]Eliezer Yudkowsky17y20

To be precise, in every case where the environment only cares about your actions and not what algorithm you use to produce them, any algorithm that can be improved by randomization can always be improved further by derandomization.

1timtyler16y

Even the passing of http://en.wikipedia.org/wiki/Diehard_tests? Randomness works pretty well!

2Eliezer Yudkowsky16y

Sure. Start with random numbers. Then run the Diehard tests over them. If the Diehard tests fail, try different random numbers. There you go, algorithm derandomized.

2timtyler16y

That's an "improvement" that's unlikely to happen before universal heat death. Improving an algorithm by adding randomness is simple in many cases - while improving it further may be totally impractical and ineffectual.

4Eliezer Yudkowsky16y

Certainly. That, however, is a mere matter of limited computing power, not of theory.

5timtyler16y

The uncertainty principle would appear to represent a problem for the thesis. Secure hardware random number generators are "really difficult" to predict. That appears to be a case where you can't - in practice - beat a random strategy - whereas a random strategy beats all kinds of stupid strategies. The "any algorithm that can be improved by randomization" is redundant - and does no useful conceptual work. Idiotic algorithms that can be improved by adding randomness are two-a-penny. The "can always be improved further by derandomization" is not true - in practice. Sometimes, you just can't find a way to do any better than the random algorithm. I think the correct principle in this area is that a deterministic algorithm can always do at least as well as a random one - provided its workings can be kept secret. The main problem case for this principle occurs when you face an opponent with a lot of resources and cryptographic skills - but with sufficient care you should be able to keep them busy until the universal heat death.

6Tyrrell_McAllister16y

I think that this is a better de-randomization: Presumably there is some set S of numbers that have been observed to pass Diehard tests reliably. The de-randomized algorithm is simply to read off S. Computationally, this is cheaper than generating your own random set and reading it off. ETA: And, if I'm not mistaken, if S has already been observed to pass Diehard tests reliably, then it is even more likely to pass them in the future than is a newly generated random set.

[-]Wei Dai16y110

It seems that randomization can serve as a general substitute for memory. It's not a perfect substitute, but whenever you don't want to, or can't, remember something for some reason, randomization might help.

Besides the example of The Absent-Minded Driver, I've realized there are other examples in cryptography:

nonces - Many encryption schemes require a unique nonce to be generated for each message to be encrypted. You can either pick random nonces, or keep a counter. But keeping a counter might be too troublesome, and you might be running in a virtual machine that can be rolled back from time to time, so it's usually better to use a random nonce, even though you'll need a longer nonce than if you used a counter (to keep the probability of collision sufficiently small).
distributed key search - You can either have a central server that hands out regions of the key space to search, or each participant can search a random region. The latter is less efficient in computing time, but more efficient in communications cost.

I might do a post on this, if I could figure out a way to think about why randomization substitutes for memory.

7pengvado16y

Let A and B be actions leading to deterministic outcomes, and let C be some lottery between A and B. A rational agent will never prefer both C>A and C>B. When you repeat the scenario without memory, the lottery is no longer exactly over choices the agent could deterministically make: the randomness is re-rolled in places where the agent doesn't get another decision. Despite what the options are labeled, you're really choosing between 2xA, 2xB, and a lottery over {2xA, 2xB, A+B}. Since the lottery contains an outcome that isn't available to the deterministic decision, it may be preferred. I think this is equivalent to the role played by observational evidence in UDT1: Observations allow a constant strategy to take different actions in different places, whereas without any observations to distinguish agent instances you have to pick one action to optimize both situations. Of course good evidence is reliably correlated with the environment whereas randomness doesn't tell you which is which, but it's better than nothing.

0Tyrrell_McAllister16y

I had some comments in this thread that outline the way that I think about this.

[-]Brian_Macker17y-20

Or you may have heard people talking about "emergence" as if it could explain complex, functional orders. People will say that the function of an ant colony emerges - as if, starting from ants that had been selected only to function as solitary individuals, the ants got together in a group for the first time and the ant colony popped right out. But ant colonies have been selected on as colonies by evolution. Optimization didn't just magically happen when the ants came together.

I don't think the point of stressing emergence is to explain via ... (read more)

[-]Ian_Maxwell17y30

@Mike Plotz: It's true that you can't do better than random in predicting (theoretical nonphysical) coin tosses, but you also can't do worse than random. As Eliezer pointed out, the claim isn't "it is always possible to to better than random", but "any algorithm which can be improved by adding randomness, can be improved even more without adding randomness."

[-]Daniel_I._Lewis17y20

Brian: How does the randomness tie in to acquired knowledge, and what is the superior non-random method making better use of that knowledge?

The knowledge in this case is your belief about the distribution of input lists. Let's say, for the sake of argument, that you get sorted lists (forwards or backwards) more often than chance, and the rest of the time you get a random permutation.

On a random list, one choice of pivot is as good as another. On a sorted list, though, the ends are always a bad choice. Picking the first item is thus stupider than averag... (read more)

[-]Stuart_Armstrong17y40

Anyone care to work out exactly how much better off 0-0-0-0 is than a random set in this case? The probability of success at each cycle goes up to 1/9999 from 1/10000 (after the first cycle).

It's obvious to those who comprehend the emergent behavior that interest rates have been set way below market rates, for too long, and that is the cause of the current crisis. By "comprehend the emergent behavior" do you mean that you have a vague intuitive feel for this, or that you have the equations relating interest rates to other factors, along with enou... (read more)

[-]MichaelHoward17y00

Would you discourage, where resources exist to find alternative strategies, use of any random element in choosing samples for statistical hypothesis testing?

[-]Caledonian217y30

Don't you get the same effect from adding an orderly grid of dots?

In that particular example, yes. Because the image is static, as is the static.

If the static could change over time, you could get a better sense of where the image lies. It's cheaper and easier - and thus 'better' - to let natural randomness produce this static, especially since significant resources would have to be expended to eliminate the random noise.

What about from aligning the dots along the lines of the image?

If we knew where the image was, we wouldn't need the dots.

To be pre

... (read more)

[-]Tom_McCabe217y40

"It is not clear this can be shown to be true. 'Improvement' depends on what is valued, and what the context permits. In the real world, the value of an algorithm depends on not only its abstract mathematical properties but the costs of implementing it in an environment for which we have only imperfect knowledge."

Eliezer specifically noted this in the post:

"Sometimes it is too expensive to take advantage of all the knowledge that we could, in theory, acquire from previous tests. Moreover, a complete enumeration or interval-skipping algorith... (read more)

[-]Silas17y40

@Caledonian and Tiiba: If we knew where the image was, we wouldn't need the dots.

Okay, let's take a step back: the scenario, as Caledonian originally stated, was that the museum people could make a patron better see the image if the museum people put random dots on the image. (Pronouns avoided for clarity.) So, the problem is framed as whether you can make someone else see an image that you already know is there, by somehow exploiting randomness. My response is that, if you already know the image is there, you can improve beyond randomness, but just put... (read more)

[-]Tim_Tyler17y00

To be precise, in every case where the environment only cares about your actions and not what algorithm you use to produce them, any algorithm that can be improved by randomization can always be improved further by derandomization.

Consider the heads-tails-edge game. Betting on edge is a deterministic strategy with a low payoff. It can be improved on by randomisation: randomly betting on heads with p=0.5 and tails with p=0.5 is a stochastic strategy which offers improved returns - and there is no deterministic strategy which produces superior results to it.

[-]George_Weinberg217y00

Isn't this trivially true? Isn't the most (time) efficient algorithm always a giant lookup table?

[-]Tim_Tyler17y00

Isn't the most (time) efficient algorithm always a giant lookup table?

That approach doesn't help an embedded observer much, if they find that they cannot determine the entries to the table - because of the uncertainty principle.

[-]Tom_McCabe217y40

"We are currently living through a crisis that is in large part due to this lack of appreciation for emergent behavior. Not only people in general but trained economists, even Nobel laureates like Paul Krugman, lack the imagination to understand the emergent behavior of free monetary systems."

"Emergence", in this instance, is an empty buzzword, see http://lesswrong.com/lw/iv/the_futility_of_emergence/. "Imagination" also seems likely to be an empty buzzword, in the sense of http://lesswrong.com/lw/jb/applause_lights/.

"pre... (read more)

[-]Brian_Jaress217y00

Daniel I. Lewis, as I said, lists can have structure even when that structure is not chosen by a person.

"Let's say, for the sake of argument, that you get sorted lists (forwards or backwards) more often than chance, and the rest of the time you get a random permutation."

Let's not say that, because it creates an artificial situation. No one would select randomly if we could assume that, yet random selection is done. In reality, lists that are bad for selecting from the middle are more common than by random chance, so random beats middle.

If ... (read more)

[-]GreedyAlgorithm17y20

Brian, you want an answer to the real-world situation? Easy. First assume you have a source of inputs that is not antagonistic, as discussed. Then measure which deterministic pivot-choice algorithms would work best on large samples of the inputs, and use the best. Median-of-three is a great pivot choosing algorithm in practice, we've found. If your source of inputs is narrower than "whatever people anywhere using my ubergeneral sort utility will input" then you may be able to do better. For example, I regularly build DFAs from language data. Part... (read more)

[-]Tim_Tyler17y00

Tom, I made it clear which comments I was addressing by quoting them.

If you really want to hack my example, you should bear in mind that the "heads", "tails", and "edge" values are produced by taking bytes from a cryptographic source of randomness, and using heads: more 0s than 1s; tails: more 1s than 0s; edge: other cases.

[-]Brian_Jaress217y00

GreedyAlgorithm,

"If your source of inputs is narrower than 'whatever people anywhere using my ubergeneral sort utility will input' then you may be able to do better."

That's actually the scenario I had in mind, and I think it's the most common. Usually, when someone does a sort, they do it with a general-purpose library function or utility.

I think most of those are actually implemented as a merge sort, which is usually faster than quicksort, but I'm not clear on how that ties in to the use of information gained during the running of the prog... (read more)

[-]Mauve17y20

Brian: FYI, mergesort isn't faster than a good quicksort. Skiena's The Algorithm Design Manual, for example, says that "experiments show that a where a properly implemented quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort" (2nd ed., p. 129).

I'm still waiting to hear what Eliezer says about your example too, as quicksort does seem to be an example of the best known implementation making necessary use of (pseudo-) randomness, and quicksort is of course extremely well-understood.

[-]Eliezer Yudkowsky17y50

Brian, a very simple analysis would say something like, "If you think the middle element is likely to be an unusually bad pivot, exclude it as a possible pivot during random selection. If you have no such belief, why not go ahead and use them? If you do have such a belief, why let random selection occasionally select bad pivots?"

[-]Brian_Macker17y00

Stuart Armstrong ,

"By 'comprehend the emergent behavior' do you mean that you have a vague intuitive feel for this, or that you have the equations relating interest rates to other factors, along with enough mathematical theory to make specific quantitative predictions?"

If I believe that a individual or committee cannot determine a market price other than by actually observing one then why on earth do you think I am claiming to be able to "make specific quantitative predictions?"

Those economists that make the mistake of thinking they can... (read more)

2Larks15y

How about shorting the Zimbabwean Dollar? Any piece of knowledge you have, and others lack, can be acted on by taking bets.

[-]Brian_Macker17y10

"'Emergence';, in this instance, is an empty buzzword.

Buzzword in this instance is a buzzword. This sentence is merely an assertion. I read that article before I wrote my argument. The phrase, "emergent behavior" and the word "emergence" have a specific meaning and it isn't about giving a "mysterious answer to a mysterious queston".

For example, Mises can and does give a complete and non-mysterious explaination of how the business cycle is a result of fractional reserve banking. Likewise, he can explain how market pr... (read more)

[-]Stuart_Armstrong17y10

Dear Brian,

Sorry if I was overly brusque in my response. you have obviously given the subject more thought than I gave you credit for.

The line you wrote here is the critical one: BTW, that theory predicted the Great Depression before the fact, and this crash before the fact. It also predicted the existence of stagflation before the fact.

Where were these predictions? Did they represent a majority view of Austrian School economics? How is their false positive, false negative score? (basically, I'm asking for a statistical survey of Austrian school economics ... (read more)

[-]Ben_Kalafut17y20

This talk of "general principles" is rubbish from a physical scientist's perspective; were what you were saying a general principle there would be no such thing as stochastic resonance, nor would Monte Carlo integration be so useful a technique. And if there is some "derandomized" alternative to simulated annealing that is superior in finding minima (or maxima) in ugly solution spaces, it hasn't been found. "So what?" you might ask, but consider that tens or hundreds of thousands of engineers, scientists, and mathematicians ... (read more)

[-]gwern17y30

To those discussing randomized Quicksort: relevant to your discussion about more intelligent behaivour might be the Introsort algorithm.

See https://secure.wikimedia.org/wikipedia/en/wiki/Introsort

"Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted. It is the best of both worlds, with a worst-case O(n log n) runtime and practical performance comparable to quick

... (read more)

[-]PleaseInformYourself17y-20

"Or you may have heard people talking about "emergence" as if it could explain complex, functional orders. People will say that the function of an ant colony emerges - as if, starting from ants that had been selected only to function as solitary individuals, the ants got together in a group for the first time and the ant colony popped right out. But ant colonies have been selected on as colonies by evolution. Optimization didn't just magically happen when the ants came together."

There is nothing magical about emergence and emergence ... (read more)

[+]START_READING_STOP_BLOGGING17y-60

[-]Tim_Tyler17y00

There is nothing magical about emergence and emergence is a well document phenomena.

Unfortunately, the term "emergence" has come to have two meanings. One is uncontroversial. The other is drivel.

[-]Brian_Macker17y-40

"Sorry if I was overly brusque in my response."
No, I don't believe you are sorry. I think you have a particular view on economics that colors your questions. You're looking for some angle to justify those beliefs. It's quite clear that the Austrians are correct about what is occuring right now.

"Because for the moment a simple "classical economics + power law of random fluctuations (possibly giving way somewhat to a gaussian distribution for rare events)" seems a much more economical theory fitting the data ..."

Really? T... (read more)

[-]Liron16y10

This is one of my favorite posts. I'm disappointed that I never grasped this during my computer science education.

[-]thomblake16y40

"it's not impossible, but it could happen" - typo? Should be "It's not very likely, but it could happen" or something like that?

[-]pdf23ds16y10

Here's an algorithm that I've heard is either really hard to derandomize, or has been proven impossible to derandomize. (I couldn't find a reference for the latter claim.) Find an arbitrary prime between two large numbers, like 10^500 - 10^501. The problem with searching sequentially is that there are arbitrarily long stretches of composites among the naturals, and if you start somewhere in one of those you'll end up spending a lot more time before you get to the end of the stretch.

6pengvado16y

See the Polymath project on that subject. The conjecture is that it is possible to derandomize, but it hasn't been proven either way. Note that finding an algorithm isn't the hard part: if a deterministic algorithm exists, then the universal dovetail algorithm also works.

[-]gwern15y40

I recently ran into a nice example of how randomness can be improved upon but you may not want to de-randomize.

One of my personal tools is a setup which archives & preserves URLs for later reference, since pretty much every web page will disappear or die eventually. The core is a loop which reads a URL out of a text file and then run the Internet requests to archive it.

The problem is that I want to archive hundreds & thousands of URLs a week, but I can only archive 1 every 20 seconds. The file is sorted to eliminate duplicates. The obvious approach... (read more)

[-]jsteinhardt14y100

As a general principle, on any problem for which you know that a particular unrandomized algorithm is unusually stupid - so that a randomized algorithm seems wiser - you should be able to use the same knowledge to produce a superior derandomized algorithm.

This claim is at least as strong as BPP = P, which while suspected is definitely not proven, and appears to be very difficult to prove. In particular, even if BPP = P, current knowledge has no way of de-randomizing all (or even most) polynomial-time randomized algorithms.

Am I misunderstanding something here?

-1BT_Uytya14y

I believe Eliezer's point wasn't that any algoritm should be de-randomized. His point was that it is impossible to extract useful work from noise; useful work is lawful. It is more philosophical statement than practical advice. To admire beauty of entropy is to worship your own ignorance and so on. Putting aside the computational complexity, space-time tradeoff, you can easily see it: if something, in principle, can be done with help of random number generator, than the same result can be obtained without using random number generator. Random is a void; optimization processes are closing that gap by ordered patterns. If you don't know how to fill in that hole, it means that you don't have enough information about some problem; it doesn't mean that entropy is magic key to this problem. And yes, in practice the cost of obtaining that information may render the derandomized algorithm useless in practice; for example, select of the "true median" (median can be found in O(n) time) as pivot in quicksort will slow it down.

0Bugmaster14y

Upon reading the article, I had the same impression as jsteinhardt. I thought that Eliezer was, in fact, giving practical advice: "if your algorithm performs better with noise, figure out what's wrong with it and remove the noise". This advice is good in some cases, but impractical in others, just as you said in your closing paragraph. I think the reason for why I read Eliezer's statement the way I did is that I don't really see the point in making "philosophical statements" about AI algorithms. These algorithms exist to efficiently solve real problems, not to score points in abstract debates.

0BT_Uytya14y

"Philosophical statements" about AI algorithms are useful: not for algorithms themselves, but for AI researchers. AI researcher shouldn't be mistaken about Mysterious Power Of The Entropy. The "We don't understand human intelligence, hence if I wish to create an artificial one, I must not understand it either" attitude is wrong. So if I was to summarize this post, the summary was something like "Noise can't be inherently good; if entropy magically solves your problem, it means that there is a some more lawful non-magical way to solve this problem." I think it is a part of more general principle "Never leave behind parts that work, but you have no slightest idea why they work".

2Bugmaster14y

I am having trouble picturing a real AI researcher who believes in the "Mysterious Power Of The Entropy". Maybe I simply lack sufficient imagination. Still, that sounds like something a philosopher might believe in, not an AI researcher who actually implements (or designs) real AI algorithms. I guess it depends on what you mean by "magical". There are tons of stochastic algorithms out there that rely on noise explicitly. Sure, there technically does exist a "lawful non-magical way" to solve the problem these algorithms are solving stochastically, but usually it's completely infeasible due to the amount of time it would take to run. Thus, one has no choice but to use the noisy algorithms. Again, this depends on how strictly you want to interpret the rule. For example, multi-layer neural networks work very well, but they are often criticized for not being transparent. You can train the network to recognize handwriting (just for example), but you can't readily explain what all the weights mean once you'd trained it.

[-]A1987dM14y10

Omega creates one more copy of you, then he asks each of you to say either “zero” or “one”; he will give each of you $10 unless you both pick the same number. You have no idea which copy you are, and cannot communicate with the other copy before you both make your choice.

The strategies “choose 0” and “choose 1” both lose, but the strategy “choose 0 with 50% probability and 1 with 50% probability” has a 50% probability of winning.

[-][anonymous]13y30

"Randomness hath no power"? This may be true for 1-player games (i.e. noncompetitive decision theory problems), but in 2-player or multiplayer game theory situations, a mixed strategy can be optimal, and in fact in some cases a pure strategy may be the worst thing one can do (rock paper scissors is a great example). Randomness hath power, it just varies by the field one enters. In fact, according to the 2nd law of thermodynamics, randomness may be the most difficult power to resist, for the universe is constantly losing to it.

[-]A1987dM12y20

It is theoretically possible for a poorly designed "pseudo-random algorithm" to be stupid relative to the search space; for example, it might always jump in the same direction. But the "pseudo-random algorithm" has to be really shoddy for that to happen. You're only likely to get stuck with that problem if you reinvent the wheel instead of using a standard, off-the-shelf solution.

You don't need that much shoddiness to get very weird things in certain situations (IIRC there was a PRNG getting some value wrong by 20 standard deviatio... (read more)

[-]christopherj12y20

OK, let me give you another example of the lock device. Each time a code is tried, the correct code changes to (previous code) + 2571 mod 10000. You don't know this. You won't find out before opening the door, because of limited feedback. Sequential check of every code will fail, but let you know that the correct code changes (if there is a correct code). Constantly guessing the same code because you think it'll randomly change to that one will fail. Random guessing will eventually succeed. Using randomness prevents you from getting stuck due to your own s... (read more)

[-]Houshalter11y00

What if the lock has multiple combinations that are close to each other. In this case the brute force does much worse than a random search. Because the correct combinations could be 98,99,100. This is true for real combination locks btw, they usually aren't too sensitive to being off by a single number, and so multiple combinations do work.

Or an even simpler case, you need to create a single algorithm to break a large number of locks. And all the locks have the same combination. And you can't save memory or modify the algorithm after it it's finished, so ... (read more)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

46

46

46