Hoagy

Comments

How can I bet on short timelines?

I roughly have similar beliefs and I've thought about the same question before.

The hope is that you could make more specific bets based on trends which are not currently clear to the world as a whole but will become apparent relatively soon. For example, I think I remember Gwern asking whether, if the scaling power of larger NNs continues, Nvidia will become the most valuable company in the world as the power of truly massive models/training volumes becomes apparent and they're in prime position to profit.

The problem is that shares on the frontier of AI developments are already subject to a lot of hype from somewhat similar beliefs (e.g. anyone who is a major blockchain believer, or a big AI believer but in a purely positive sense). These stocks are therefore already significantly overvalued by traditional metrics and it's not obvious whether NN progress is enough to generate major share price growth, at least with high enough probability to overcome the presumably very high discount rates that you have, even within the next 10 years (e.g. Nvidia market cap is $360B, so even becoming the largest company in the world only implies a ~6x price increase and it's hard to give this more than 15% credence in the next decade).

It seems that if you believe specifically in short timelines then there may be companies who are particularly likely to succeed given the importance of massive models (if indeed that's the way you expect things to play out). At the moment though, most of those in position to take advantage seem to either be embedded in larger companies (DeepMind, big tech AI divisions) or just not public (OpenAI, most startups). 

Ideally I guess there would be a venture capital fund which you could place money into which would invest in the most promising companies which themselves are betting on being in position to take commercial advantage of ML breakthroughs. I'm not sure I'm aware of any such fund but I'd certainly be interested if one exists/is being created.

Hoagy's Shortform

Question about error-correcting codes that's probably in the literature but I don't seem to be able to find the right search terms:

How can we apply error-correcting codes to logical *algorithms*, as well as bit streams?

If we want to check that bit-stream is accurate, we know how to do this for a manageable overhead - but what happens if there's an error in the hardware that does the checking? It's not easy for me to construct a system that has no single point of failure - you can run the correction algorithm multiple times but how do you compare the results without ending up back with a single point of failure?

Anyone know any relevant papers or got a cool solution?

Interested for the stability of computronium-based futures!

Developmental Stages of GPTs

I agree that this is the biggest concern with these models, and the GPT-n series running out of steam wouldn't be a huge relief. It looks likely that we'll have the first human-scale (in terms of parameters) NNs before 2026 - Metaculus, 81% as of 13.08.2020.

Does anybody know of any work that's analysing the rate at which, once the first NN crosses the n-parameter barrier, other architectures are also tried at that scale? If no-one's done it yet, I'll have a look at scraping the data from Papers With Code's databases on e.g. ImageNet models, it might be able to answer your question on how many have been tried at >100B as well.

Preparing for "The Talk" with AI projects

Hey Daniel, don't have time for a proper reply right now but am interested in talking about this at some point soon. I'm currently in UK Civil Service and will be trying to speak to people in their Office for AI at some point soon to get a feel for what's going on there, perhaps plant some seeds of concern. I think some similar things apply.

Predicted Land Value Tax: a better tax than an unimproved land value tax

As I understand it, one of the biggest issues with a land value tax is that the existence of the tax instantly makes owning land much less desirable - reduced by the net present value of the total future taxation. This is obviously in some sense part of the plan but it causes some pretty large sudden shifts in wealth - in particular away from anyone who has a mortgage but also just from home owners in general.

Implementing it in a fair/politically acceptable way then seems to require either a far-off starting date, a very slow taper in or a very large series of handouts to compensate, and all of these are difficult for a government to implement given the time horizon of elections and a large, wealthy group who will be opposed to this, likely including inside the governing party.

This isn't especially relevant to your variant but if you're thinking about how to get efficient taxation then this is something to think about trying to find a solution to :)

162 benefits of coronavirus

On the numbers from The Precipice - I think the point is that the next 100 years have an estimated 1/6 chance of extinction, but also contain the power to protect us from future harm and facilitate the human race flourishing across the universe. Extrapolating risk from next 100 years to an expected 600 year lifespan, and using current population forecasts as the number of humans involved therefore seems not in the spirit of his model.

Soft takeoff can still lead to decisive strategic advantage

I think this this points to the strategic supremacy of relevant infrastructure in these scenarios. From what I remember of the battleship era, having an advantage in design didn't seem to be a particularly large advantage - once a new era was entered, everyone with sufficient infrastructure switches to the new technology and an arms race starts from scratch.

This feels similar to the AI scenario, where technology seems likely to spread quickly through a combination of high financial incentive, interconnected social networks, state-sponsored espionage etc. The way in which a serious differential emerges is likely to be more through a gap in the infrastructure to implement the new technology. It seems that the current world is tilted towards infrastructure ability diffusing fast enough to, but it seems possible that if we have a massive increase in economic growth then this balance is altered and infrastructure gaps emerge, creating differentials that can't easily be reversed by a few algorithm leaks.

Torture and Dust Specks and Joy--Oh my! or: Non-Archimedean Utility Functions as Pseudograded Vector Spaces
Hoagy1y15Ω7

Apologies if this is not the discussion you wanted, but it's hard to engage with comparability classes without a framework for how their boundaries are even minimally plausible.

Would you say that all types of discomfort are comparable with higher quantities of themselves? Is there always a marginally worse type of discomfort for any given negative experience? So long as both of these are true (and I struggle to deny them) then transitivity seems to connect the entire spectrum of negative experience. Do you think there is a way to remove the transitivity of comparability and still have a coherent system? This, to me, would be the core requirement for making dust specks and torture incomparable.


Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Late to the party but I'm pretty confident he's saying the opposite - that a 1 PFLOP/s system is likely to have 10 or more times the computational capacity of the human brain, which is rather terrifying.

He gives the example of Baidu's Deep Speech 2 which requires around 1 GFLOP/s to run and produces human-comparable results. This is 10^6 slower than the 1 PFLOP/s machine. He estimates that this process in humans take around 10^-3 of the human brain, thereby giving the estimate of a 1 PFLOP/s system being 10^3 times faster than the brain. His other examples give similar results.

Capability amplification

An easy way to deal with this difficulty is to replace 'at least as happy with policy A as with policy B (in any situation that we think might arise in practice)' with 'at least as happy with policy A as with policy B (when averaged over the distribution of situations that we expect to arise)', though this is clearly much weaker.

To me it seems that the reason this stronger sense of ordering is used is because we expect this amplification procedure to be of a sort that produces results such that is strictly better than but that even if this wasn't the case, the concept of an obstruction would still be a useful one. Perhaps it would be reasonable to take the more relaxed definition but expect that amplification would produce results that are strictly better.

I also agree with Chris below that defining an obstruction in terms of this 'better than' relation brings in serious difficulty. There are exponentially many policies that are no better than and there may well be a subset of these can be amplified beyond but as far as I can tell there's no clear way to identify these. We thus have an exponential obstacle to progress even within a partition, necessitating a stronger definition.

Load More