ViktorThink

Wiki Contributions

Comments

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

It's very interesting to see the implications of how well transformers will continue to scale. 

Here are some stats:

 Megatron-Turing NLG on LambadaGPT-3
Lambada (few shot)0.8720.864
PiQA (zero shot)
0.820
80.5
PiQA (one shot)0.81080.5
PiQA (few shot)0.83282.8

Megatron-Truing NLG performs better, and even if the difference is small, I've seen comparisons with smaller models where even small differences of 1% means there is a noticeable difference in intelligence when using the models for text generation.

The 2021 Less Wrong Darwin Game

Thank you aphyer! You're absolutely right, the size is 3.85, so it needs like 0.8 energy per turn.

5/0.8≈6, with this number it's more likely for the animal to die out by random chance.

I redid the experiment on the Benthic biome, and removed the cold resistance, set Seeds in the biome to 1, and the species survives with around 6-21 in population, which is around the population of 12 we would expect.
 

The 2021 Less Wrong Darwin Game

A question about mechanics:

When I run a simulation with only this animal on Tundra:

Biome: Tundra

Venomous: No

Weapon: 0

Armour: 1

Speed: 0

Eats: Seeds

Cold (Allows survival in the Tundra): Yes

It goes down to 2-3 population for a while, and then dies after 25 generations.

However, I calculate the size is 1.85 (0.1 base + 0.75 armour + 1 eats seed size), and it should only need 20% of that in food per turn, which is <0.4. Tundra has 1 seeds per turn, which should give 5 food. Does anyone have an explanation why it doesn't stabilize around a population of 5/0.4≈12?

I'm not sure if I just misunderstood the mechanics, or if maybe I messed up something when testing the code, but would be grateful if anyone could clarify what should happen in this situation.

Hardware for Transformative AI


That's a good question. I can see both the scenario of price increasing both more or less than that.

The compute needed for the training is in this example the only significant factor in price, and that's the one that scales at 10x cost for 5x size. (However, sadly I can't find the source where I read it, so again, please feel free to share if someone has a better method for estimation.)

Building the infrastructure for training a model of several trillion parameters could easily create a chip shortage, drastically increasing costs of AI chips, and thus leading to training costing way more than the estimate.

However, it might be possible that building a huge infrastructure would have many benefits of scale. For example Google might build a TPU "gigafactory" and because of the high volume of produced TPUs, the price per TPU would decrease significantly.

Hardware for Transformative AI

Isn't that math wrong? 17 trillion parameters is 100x more than GPT-3, so the cost should be at least 100x higher, so if the cost is $12M now it should be at least a billion dollars. I think it would be about $3B. It would probably cost a bit more than that since the scaling law will probably bend soon and more data will be needed per parameter. Also there may be inefficiencies from doing things at that scale. I'd guess maybe $10B give or take an order of magnitude.

You are absolutely correct, the cost must be more than 100x if costs scales faster than number of parameters. I have now updated the calculations and got a 727 increase of costs. 

Maybe the relevant sort of AI system won't just stream-of-consciousness generate words like GPT-3 does, but rather be some sort of internal bureaucracy of prompt programming that e.g. takes notes to itself, spins of sub-routines to do various tasks like looking up facts, reviews and edits text before finalizing the product, etc. such that 10x or even 100x compute is spent per word of generated text. This would mean $1 - $10 per 700 words generated, which is maybe enough to be outcompeted by humans for many applications.

I suspect you might be right. If we imagine the human brain, every neuron is reused tens of times by the time it takes to say a single word, so I it doesn't seem unlikely that a good architecture reuses neurons multiple times before "outputting" something. So I think an increase as you say with about 10-100x is not unlikely.

Quadratic, not logarithmic

Maybe I misunderstood something, but is really the odds of getting sick yourself growing linearly?

Let's say if you meet 1 person the odds of getting corona is 1%. That would mean that meeting 101 people would result in a 101% chance in getting corona. Sure, it is almost linear until you get to about 10 people (in my example), but the curve of risk to get corona should follow something like 1-(0.99)^n if we assume the risk of getting corona is equal from every person you have contact with.

Spreading it linearly with the number of people you meet however I can understand the reasoning behind.

Pseudorandomness contest: prizes, results, and analysis

From the scientific paper I mentioned in the first comment they used different questions, here is an example:

“The questionnaires asked for interval estimates of birth years for five famous characters from world history (Mohammed, Newton, Mozart, Napoleon, and Einstein), and the years of death for five other famous persons (Nero, Copernicus, Galileo, Shakespeare, and Lincoln).”

I tested to answer these questions myself with 90% confidence intervals, and my result was that I was correct 7/10 questions, so seems like I still am overconfident in my answers even though I just read about it. But to be fair, 10 questions are far from statistical significance.

Pseudorandomness contest: prizes, results, and analysis

Wow, really interesting article.

It is really interesting that the median result was negative, although strategic overconfidence as some has pointed out explains some of it.

Found a very interesting paper on the subject of overconfidence: https://www.researchgate.net/publication/227867941_When_90_confidence_intervals_are_50_certain_On_the_credibility_of_credible_intervals

“Estimated confidence intervals for general knowledge items are usually too narrow. We report five experiments showing that people have much less confidence in these intervals than dictated by the assigned level of confidence. For instance, 90% intervals can be associated with an estimated confidence of 50% or less (and still lower hit rates). Moreover, interval width appears to remain stable over a wide range of instructions (high and low numeric and verbal confidence levels).”

What is the currency of the future? 5 suggestions.

Yes, I agree that governments will be likely to "defend" their local fiat currencies, since they both have incentives (like control of the currency and production of more, and often relies on creating more to fund budget deficits), as well as the means to defend it.

I personally would really like such a bank account, that automatically invested the money in the account in the way I want, if the fees were low enough.

What is the currency of the future? 5 suggestions.

Yes, and also usually the currency becomes safer (harder to "hack") with more miners.

Load More